dtaidistance.clustering.kmeans¶
(requires version 2.2.0 or higher)
Time series clustering using kmeans and Barycenter averages.
author:  Wannes Meert 

copyright:  Copyright 20202022 KU Leuven, DTAI Research Group. 
license:  Apache License, Version 2.0, see LICENSE for details. 

class
dtaidistance.clustering.kmeans.
KMeans
(k, max_it=10, max_dba_it=10, thr=0.0001, drop_stddev=None, nb_prob_samples=None, dists_options=None, show_progress=True, initialize_with_kmedoids=False, initialize_with_kmeanspp=True, initialize_sample_size=None)¶ Kmeans clustering algorithm for time series using Dynamic Barycenter Averaging.
Parameters:  k – Number of components
 max_it – Maximal interations for Kmeans
 max_dba_it – Maximal iterations for the Dynamic Barycenter Averaging.
 thr – Convergence is achieved if the averaging iterations differ less then this threshold
 drop_stddev – When computing the average series per cluster, ignore the instances that are further away than stddev*drop_stddev from the prototype (this is a gradual effect, the algorithm starts with drop_stddev is 3). This is related to robust kmeans approaches that use trimming functions.
 nb_prob_samples – Probabilistically sample best path this number of times.
 dists_options –
 show_progress –
 initialize_with_kmedoids – Cluster a sample of the dataset first using Kmedoids.
 initialize_with_kmeanspp – Use kmeans++
 initialize_sample_size – How many samples to use for initialization with Kmedoids or Kmeans++. Defaults are k*20 for Kmedoid and 2+log(k) for kmeans++.

fit
(series, use_c=False, use_parallel=True)¶ Perform Kmeans clustering.
Parameters:  series – Container with series
 use_c – Use the Clibrary (only available if package is compiled)
 use_parallel – Use multipool for parallelization
Returns: cluster indices, number of iterations If the number of iterations is equal to max_it, the clustering did not converge.

fit_fast
(series)¶

kmeansplusplus_centers
(series, use_c=False)¶ Better initialization for KMeans.
Arthur, D., and S. Vassilvitskii. “kmeans++: the, advantages of careful seeding. In, SODA’07: Proceedings of the.” eighteenth annual ACMSIAM symposium on Discrete, algorithms.Procedure (in paper):
 1a. Choose an initial center c_1 uniformly at random from X.
 1b. Choose the next center c_i , selecting c_i = x′∈X with probability D(x’)^2/sum(D(x)^2, x∈X).
 1c. Repeat Step 1b until we have chosen a total of k centers.
 (24. Proceed as with the standard kmeans algorithm.)
Extension (in conclusion):
 Also, experiments showed that kmeans++ generally performed better if it selected several new centers during each iteration, and then greedily chose the one that decreased φ as much as possible.
Detail (in original code):
 numLocalTries==2+log(k)
Parameters:  series –
 use_c –
Returns:

kmedoids_centers
(series, use_c=False)¶

plot
(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)¶