dtaidistance.clustering.hierarchical

Time series clustering using hierarchical clustering.

author:Wannes Meert
copyright:Copyright 2017-2022 KU Leuven, DTAI Research Group.
license:Apache License, Version 2.0, see LICENSE for details.
class dtaidistance.clustering.hierarchical.BaseTree(**kwargs)

Base Tree abstract class.

Returns a datastructure compatible with the Scipy clustering methods:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html

A (n-1) by 4 matrix Z is returned. At the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n + i. A cluster with an index less than n corresponds to one of the original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster.

get_linkage(node)
maxnode
plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)

Plot the hierarchy and time series.

Parameters:
  • filename – If a filename is passed, the image is written to this file.
  • axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
  • ts_height – Height of a time series
  • bottom_margin – Margin on bottom
  • top_margin – Margin on top
  • ts_left_margin – Margin on left of time series image
  • ts_sample_length – Space between two points in the time series
  • tr_label_margin – Margin between tree split and label
  • tr_left_margin – Left margin for tree
  • ts_label_margin – Margin between start of series and label
  • show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • cmap – Matplotlib colormap name
  • ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)
to_dot()
class dtaidistance.clustering.hierarchical.Hierarchical(dists_fun, dists_options, max_dist=inf, merge_hook=None, order_hook=None, show_progress=True)

Hierarchical clustering.

Note: This method first computes the entire distance matrix. This is not ideal for extremely large data sets.

Parameters:
  • dists_fun – Function to compute pairwise distance matrix between set of series.
  • dists_options – Arguments to pass to dists_fun.
  • max_dist – Do not merge or cluster series that are further apart than this.
  • merge_hook – Function that is called when two series are clustered. The function definition is def merge_hook(from_idx, to_idx, distance), where idx is the index of the series.
  • order_hook – Function that is called to decide on the next idx out of all shortest distances
  • show_progress – Use a tqdm progress bar
Returns:

Cluster indices

fit(series)

Merge sequences.

Parameters:series – Sequence over series.
Returns:Dictionary with as keys the prototype indicices and as values all the indicides of the series in that cluster.
plot(*args, **kwargs)
class dtaidistance.clustering.hierarchical.HierarchicalTree(model=None, **kwargs)

Wrapper to keep track of the full tree that represents the hierarchical clustering.

The linkage tree is available in self.linkage.

Parameters:model – Clustering object. For example of class Hierarchical. If no model is given, the arguments are identical to those of class Hierarchical.
fit(series, *args, **kwargs)

Fit a hierarchical clustering tree.

All arguments are passed when calling the model past to __init__. The linkage tree is also available in self.linkage.

Parameters:series – Sequence over time series
Returns:Linkage datastructure
get_linkage(node)
maxnode
plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)

Plot the hierarchy and time series.

Parameters:
  • filename – If a filename is passed, the image is written to this file.
  • axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
  • ts_height – Height of a time series
  • bottom_margin – Margin on bottom
  • top_margin – Margin on top
  • ts_left_margin – Margin on left of time series image
  • ts_sample_length – Space between two points in the time series
  • tr_label_margin – Margin between tree split and label
  • tr_left_margin – Left margin for tree
  • ts_label_margin – Margin between start of series and label
  • show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • cmap – Matplotlib colormap name
  • ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)
to_dot()
class dtaidistance.clustering.hierarchical.Hooks
static create_orderhook(weights)
static create_weighthook(weights, series)
class dtaidistance.clustering.hierarchical.LinkageTree(dists_fun, dists_options=None, method='complete')

Hierarchical clustering using the Scipy linkage function.

The linkage tree is available in self.linkage.

This is the same but faster algorithm as available in Hierarchical (~10 times faster). But with less options to steer the clustering (e.g. no possibility to give weights). It still computes the entire distance matrix first and is thus not ideal for extremely large data sets.

Parameters:
  • dists_fun – Distance funcion, e.g. dtw.distance
  • dists_options – Options passed to dists_fun
  • method – Linkage method (see scipy.cluster.hierarchy.linkage)
fit(series)

Fit a hierarchical clustering tree.

The linkage tree is also available in self.linkage.

Parameters:series – Sequence over time series
Returns:Linkage datastructure
get_linkage(node)
maxnode
plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)

Plot the hierarchy and time series.

Parameters:
  • filename – If a filename is passed, the image is written to this file.
  • axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
  • ts_height – Height of a time series
  • bottom_margin – Margin on bottom
  • top_margin – Margin on top
  • ts_left_margin – Margin on left of time series image
  • ts_sample_length – Space between two points in the time series
  • tr_label_margin – Margin between tree split and label
  • tr_left_margin – Left margin for tree
  • ts_label_margin – Margin between start of series and label
  • show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
  • cmap – Matplotlib colormap name
  • ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)
to_dot()