(requires version 2.2.0 or higher)
Time series clustering using k-means and Barycenter averages.
|copyright:||Copyright 2020-2022 KU Leuven, DTAI Research Group.|
|license:||Apache License, Version 2.0, see LICENSE for details.|
KMeans(k, max_it=10, max_dba_it=10, thr=0.0001, drop_stddev=None, nb_prob_samples=None, dists_options=None, show_progress=True, initialize_with_kmedoids=False, initialize_with_kmeanspp=True, initialize_sample_size=None)¶
K-means clustering algorithm for time series using Dynamic Barycenter Averaging.
- k – Number of components
- max_it – Maximal interations for K-means
- max_dba_it – Maximal iterations for the Dynamic Barycenter Averaging.
- thr – Convergence is achieved if the averaging iterations differ less then this threshold
- drop_stddev – When computing the average series per cluster, ignore the instances that are further away than stddev*drop_stddev from the prototype (this is a gradual effect, the algorithm starts with drop_stddev is 3). This is related to robust k-means approaches that use trimming functions.
- nb_prob_samples – Probabilistically sample best path this number of times.
- dists_options –
- show_progress –
- initialize_with_kmedoids – Cluster a sample of the dataset first using K-medoids.
- initialize_with_kmeanspp – Use k-means++
- initialize_sample_size – How many samples to use for initialization with K-medoids or K-means++. Defaults are k*20 for K-medoid and 2+log(k) for k-means++.
fit(series, use_c=False, use_parallel=True)¶
Perform K-means clustering.
- series – Container with series
- use_c – Use the C-library (only available if package is compiled)
- use_parallel – Use multipool for parallelization
cluster indices, number of iterations If the number of iterations is equal to max_it, the clustering did not converge.
Better initialization for K-Means.Arthur, D., and S. Vassilvitskii. “k-means++: the, advantages of careful seeding. In, SODA’07: Proceedings of the.” eighteenth annual ACM-SIAM symposium on Discrete, algorithms.
Procedure (in paper):
- 1a. Choose an initial center c_1 uniformly at random from X.
- 1b. Choose the next center c_i , selecting c_i = x′∈X with probability D(x’)^2/sum(D(x)^2, x∈X).
- 1c. Repeat Step 1b until we have chosen a total of k centers.
- (2-4. Proceed as with the standard k-means algorithm.)
Extension (in conclusion):
- Also, experiments showed that k-means++ generally performed better if it selected several new centers during each iteration, and then greedily chose the one that decreased φ as much as possible.
Detail (in original code):
- series –
- use_c –
plot(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)¶