# dtaidistance.clustering¶

Time series clustering.

author: Wannes Meert Copyright 2017 KU Leuven, DTAI Research Group. Apache License, Version 2.0, see LICENSE for details.
class dtaidistance.clustering.BaseTree(**kwargs)

Base Tree abstract class.

Returns a datastructure compatible with the Scipy clustering methods:

A (n-1) by 4 matrix Z is returned. At the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n + i. A cluster with an index less than n corresponds to one of the original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster.

plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r')

Plot the hierarchy and time series.

Parameters: filename – If a filename is passed, the image is written to this file. axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present. ts_height – Height of a time series bottom_margin – Margin on bottom top_margin – Margin on top ts_left_margin – Margin on left of time series image ts_sample_length – Space between two points in the time series tr_label_margin – Margin between tree split and label tr_left_margin – Left margin for tree ts_label_margin – Margin between start of series and label show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed. show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed. cmap – Matplotlib colormap name
class dtaidistance.clustering.Hierarchical(dists_fun, dists_options, max_dist=inf, merge_hook=None, order_hook=None, show_progress=True)

Hierarchical clustering.

Note: This method first computes the entire distance matrix. This is not ideal for extremely large data sets.

Parameters: dists_fun – Function to compute pairwise distance matrix between set of series. dists_options – Arguments to pass to dists_fun. max_dist – Do not merge or cluster series that are further apart than this. merge_hook – Function that is called when two series are clustered. The function definition is def merge_hook(from_idx, to_idx, distance), where idx is the index of the series. order_hook – Function that is called to decide on the next idx out of all shortest distances show_progress – Use a tqdm progress bar
fit(series)

Merge sequences.

Parameters: series – Iterator over series. Dictionary with as keys the prototype indicices and as values all the indicides of the series in that cluster.
class dtaidistance.clustering.HierarchicalTree(model=None, **kwargs)

Wrapper to keep track of the full tree that represents the hierarchical clustering.

Parameters: model – Clustering object. For example of class Hierarchical. If no model is given, the arguments are identical to those of class Hierarchical.
class dtaidistance.clustering.LinkageTree(dists_fun, dists_options)

Hierarchical clustering using the Scipy linkage function.

This is the same but faster algorithm as available in Hierarchical (~10 times faster). But with less options to steer the clustering (e.g. no possibility to give weights). It still computes the entire distance matrix first and is thus not ideal for extremely large data sets.