dtaidistance.clustering.kmeans¶
(requires version 2.2.0 or higher)
Time series clustering using k-means and Barycenter averages.
author: | Wannes Meert |
---|---|
copyright: | Copyright 2020-2022 KU Leuven, DTAI Research Group. |
license: | Apache License, Version 2.0, see LICENSE for details. |
-
class
dtaidistance.clustering.kmeans.
KMeans
(k, max_it=10, max_dba_it=10, thr=0.0001, drop_stddev=None, nb_prob_samples=None, dists_options=None, show_progress=True, initialize_with_kmedoids=False, initialize_with_kmeanspp=True, initialize_sample_size=None)¶ K-means clustering algorithm for time series using Dynamic Barycenter Averaging.
Parameters: - k – Number of components
- max_it – Maximal interations for K-means
- max_dba_it – Maximal iterations for the Dynamic Barycenter Averaging.
- thr – Convergence is achieved if the averaging iterations differ less then this threshold
- drop_stddev – When computing the average series per cluster, ignore the instances that are further away than stddev*drop_stddev from the prototype (this is a gradual effect, the algorithm starts with drop_stddev is 3). This is related to robust k-means approaches that use trimming functions.
- nb_prob_samples – Probabilistically sample best path this number of times.
- dists_options –
- show_progress –
- initialize_with_kmedoids – Cluster a sample of the dataset first using K-medoids.
- initialize_with_kmeanspp – Use k-means++
- initialize_sample_size – How many samples to use for initialization with K-medoids or K-means++. Defaults are k*20 for K-medoid and 2+log(k) for k-means++.
-
fit
(series, use_c=False, use_parallel=True)¶ Perform K-means clustering.
Parameters: - series – Container with series
- use_c – Use the C-library (only available if package is compiled)
- use_parallel – Use multipool for parallelization
Returns: cluster indices, number of iterations If the number of iterations is equal to max_it, the clustering did not converge.
-
fit_fast
(series)¶
-
kmeansplusplus_centers
(series, use_c=False)¶ Better initialization for K-Means.
Arthur, D., and S. Vassilvitskii. “k-means++: the, advantages of careful seeding. In, SODA’07: Proceedings of the.” eighteenth annual ACM-SIAM symposium on Discrete, algorithms.Procedure (in paper):
- 1a. Choose an initial center c_1 uniformly at random from X.
- 1b. Choose the next center c_i , selecting c_i = x′∈X with probability D(x’)^2/sum(D(x)^2, x∈X).
- 1c. Repeat Step 1b until we have chosen a total of k centers.
- (2-4. Proceed as with the standard k-means algorithm.)
Extension (in conclusion):
- Also, experiments showed that k-means++ generally performed better if it selected several new centers during each iteration, and then greedily chose the one that decreased φ as much as possible.
Detail (in original code):
- numLocalTries==2+log(k)
Parameters: - series –
- use_c –
Returns:
-
kmedoids_centers
(series, use_c=False)¶
-
plot
(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)¶