dtaidistance.clustering.kmeans¶

(requires version 2.2.0 or higher)

Time series clustering using k-means and Barycenter averages.

author:	Wannes Meert
copyright:	Copyright 2020-2022 KU Leuven, DTAI Research Group.
license:	Apache License, Version 2.0, see LICENSE for details.

class dtaidistance.clustering.kmeans.KMeans(k, max_it=10, max_dba_it=10, thr=0.0001, drop_stddev=None, nb_prob_samples=None, dists_options=None, show_progress=True, initialize_with_kmedoids=False, initialize_with_kmeanspp=True, initialize_sample_size=None)¶

K-means clustering algorithm for time series using Dynamic Barycenter Averaging.

Parameters:

k – Number of components
max_it – Maximal interations for K-means
max_dba_it – Maximal iterations for the Dynamic Barycenter Averaging.
thr – Convergence is achieved if the averaging iterations differ less then this threshold
drop_stddev – When computing the average series per cluster, ignore the instances that are further away than stddev*drop_stddev from the prototype (this is a gradual effect, the algorithm starts with drop_stddev is 3). This is related to robust k-means approaches that use trimming functions.
nb_prob_samples – Probabilistically sample best path this number of times.
dists_options –
show_progress –
initialize_with_kmedoids – Cluster a sample of the dataset first using K-medoids.
initialize_with_kmeanspp – Use k-means++
initialize_sample_size – How many samples to use for initialization with K-medoids or K-means++. Defaults are k*20 for K-medoid and 2+log(k) for k-means++.

fit(series, use_c=False, use_parallel=True)¶

Perform K-means clustering.

Parameters:	series – Container with series use_c – Use the C-library (only available if package is compiled) use_parallel – Use multipool for parallelization
Returns:	cluster indices, number of iterations If the number of iterations is equal to max_it, the clustering did not converge.

fit_fast(series)¶

kmeansplusplus_centers(series, use_c=False)¶

Better initialization for K-Means.

Arthur, D., and S. Vassilvitskii. “k-means++: the, advantages of careful seeding. In, SODA’07: Proceedings of the.” eighteenth annual ACM-SIAM symposium on Discrete, algorithms.

Procedure (in paper):

1a. Choose an initial center c_1 uniformly at random from X.
1b. Choose the next center c_i , selecting c_i = x′∈X with probability D(x’)^2/sum(D(x)^2, x∈X).
1c. Repeat Step 1b until we have chosen a total of k centers.
(2-4. Proceed as with the standard k-means algorithm.)

Extension (in conclusion):

Also, experiments showed that k-means++ generally performed better if it selected several new centers during each iteration, and then greedily chose the one that decreased φ as much as possible.

Detail (in original code):

numLocalTries==2+log(k)

Parameters:	series – use_c –
Returns:

kmedoids_centers(series, use_c=False)¶

plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)¶