dtaidistance.clustering.hierarchical
Time series clustering using hierarchical clustering.
- author:
Wannes Meert
- copyright:
Copyright 2017-2022 KU Leuven, DTAI Research Group.
- license:
Apache License, Version 2.0, see LICENSE for details.
- class dtaidistance.clustering.hierarchical.BaseTree(**kwargs)
Base Tree abstract class.
Returns a datastructure compatible with the Scipy clustering methods:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html
A (n-1) by 4 matrix Z is returned. At the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n + i. A cluster with an index less than n corresponds to one of the original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster.
- get_linkage(node)
- property maxnode
- plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)
Plot the hierarchy and time series.
- Parameters:
filename – If a filename is passed, the image is written to this file.
axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
ts_height – Height of a time series
bottom_margin – Margin on bottom
top_margin – Margin on top
ts_left_margin – Margin on left of time series image
ts_sample_length – Space between two points in the time series
tr_label_margin – Margin between tree split and label
tr_left_margin – Left margin for tree
ts_label_margin – Margin between start of series and label
show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
cmap – Matplotlib colormap name
ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)
- to_dot()
- class dtaidistance.clustering.hierarchical.Hierarchical(dists_fun, dists_options, max_dist=inf, merge_hook=None, order_hook=None, show_progress=True)
Hierarchical clustering.
Note: This method first computes the entire distance matrix. This is not ideal for extremely large data sets.
- Parameters:
dists_fun – Function to compute pairwise distance matrix between set of series.
dists_options – Arguments to pass to dists_fun.
max_dist – Do not merge or cluster series that are further apart than this.
merge_hook – Function that is called when two series are clustered. The function definition is def merge_hook(from_idx, to_idx, distance), where idx is the index of the series.
order_hook – Function that is called to decide on the next idx out of all shortest distances
show_progress – Use a tqdm progress bar
- Returns:
Cluster indices
- fit(series)
Merge sequences.
- Parameters:
series – Sequence over series.
- Returns:
Dictionary with as keys the prototype indicices and as values all the indicides of the series in that cluster.
- plot(*args, **kwargs)
- class dtaidistance.clustering.hierarchical.HierarchicalTree(model=None, **kwargs)
Wrapper to keep track of the full tree that represents the hierarchical clustering.
The linkage tree is available in self.linkage.
- Parameters:
model – Clustering object. For example of class
Hierarchical. If no model is given, the arguments are identical to those of classHierarchical.
- fit(series, *args, **kwargs)
Fit a hierarchical clustering tree.
All arguments are passed when calling the model past to __init__. The linkage tree is also available in self.linkage.
- Parameters:
series – Sequence over time series
- Returns:
Linkage datastructure
- get_linkage(node)
- property maxnode
- plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)
Plot the hierarchy and time series.
- Parameters:
filename – If a filename is passed, the image is written to this file.
axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
ts_height – Height of a time series
bottom_margin – Margin on bottom
top_margin – Margin on top
ts_left_margin – Margin on left of time series image
ts_sample_length – Space between two points in the time series
tr_label_margin – Margin between tree split and label
tr_left_margin – Left margin for tree
ts_label_margin – Margin between start of series and label
show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
cmap – Matplotlib colormap name
ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)
- to_dot()
- class dtaidistance.clustering.hierarchical.Hooks
- static create_orderhook(weights)
- static create_weighthook(weights, series)
- class dtaidistance.clustering.hierarchical.LinkageTree(dists_fun, dists_options=None, method='complete')
Hierarchical clustering using the Scipy linkage function.
The linkage tree is available in self.linkage.
This is the same but faster algorithm as available in Hierarchical (~10 times faster). But with less options to steer the clustering (e.g. no possibility to give weights). It still computes the entire distance matrix first and is thus not ideal for extremely large data sets.
- Parameters:
dists_fun – Distance funcion, e.g. dtw.distance
dists_options – Options passed to dists_fun
method – Linkage method (see scipy.cluster.hierarchy.linkage)
- fit(series)
Fit a hierarchical clustering tree.
The linkage tree is also available in self.linkage.
- Parameters:
series – Sequence over time series
- Returns:
Linkage datastructure
- get_linkage(node)
- property maxnode
- plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)
Plot the hierarchy and time series.
- Parameters:
filename – If a filename is passed, the image is written to this file.
axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
ts_height – Height of a time series
bottom_margin – Margin on bottom
top_margin – Margin on top
ts_left_margin – Margin on left of time series image
ts_sample_length – Space between two points in the time series
tr_label_margin – Margin between tree split and label
tr_left_margin – Left margin for tree
ts_label_margin – Margin between start of series and label
show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
cmap – Matplotlib colormap name
ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)
- to_dot()