dtaidistance.clustering

Time series clustering.

author:Wannes Meert
copyright:Copyright 2017 KU Leuven, DTAI Research Group.
license:Apache License, Version 2.0, see LICENSE for details.
class dtaidistance.clustering.BaseTree(**kwargs)

Base Tree abstract class.

Returns a datastructure compatible with the Scipy clustering methods:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html

A (n-1) by 4 matrix Z is returned. At the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n + i. A cluster with an index less than n corresponds to one of the original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster.

plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r')

Plot the hierarchy and time series.

Parameters:
  • filename – If a filename is passed, the image is written to this file.
  • axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
  • ts_height – Height of a time series
  • bottom_margin – Margin on bottom
  • top_margin – Margin on top
  • ts_left_margin – Margin on left of time series image
  • ts_sample_length – Space between two points in the time series
  • tr_label_margin – Margin between tree split and label
  • tr_left_margin – Left margin for tree
  • ts_label_margin – Margin between start of series and label
  • show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • cmap – Matplotlib colormap name
class dtaidistance.clustering.Hierarchical(dists_fun, dists_options, max_dist=inf, merge_hook=None, order_hook=None, show_progress=True)

Hierarchical clustering.

Note: This method first computes the entire distance matrix. This is not ideal for extremely large data sets.

Parameters:
  • dists_fun – Function to compute pairwise distance matrix between set of series.
  • dists_options – Arguments to pass to dists_fun.
  • max_dist – Do not merge or cluster series that are further apart than this.
  • merge_hook – Function that is called when two series are clustered. The function definition is def merge_hook(from_idx, to_idx, distance), where idx is the index of the series.
  • order_hook – Function that is called to decide on the next idx out of all shortest distances
  • show_progress – Use a tqdm progress bar
fit(series)

Merge sequences.

Parameters:series – Iterator over series.
Returns:Dictionary with as keys the prototype indicices and as values all the indicides of the series in that cluster.
class dtaidistance.clustering.HierarchicalTree(model=None, **kwargs)

Wrapper to keep track of the full tree that represents the hierarchical clustering.

Parameters:model – Clustering object. For example of class Hierarchical. If no model is given, the arguments are identical to those of class Hierarchical.
class dtaidistance.clustering.LinkageTree(dists_fun, dists_options)

Hierarchical clustering using the Scipy linkage function.

This is the same but faster algorithm as available in Hierarchical (~10 times faster). But with less options to steer the clustering (e.g. no possibility to give weights). It still computes the entire distance matrix first and is thus not ideal for extremely large data sets.