dtaidistance.subsequence.dtw

(requires version 2.3.0 or higher)

DTW-based subsequence matching.

author:Wannes Meert
copyright:Copyright 2021-2022 KU Leuven, DTAI Research Group.
license:Apache License, Version 2.0, see LICENSE for details.
class dtaidistance.subsequence.dtw.LCMatch(lc, row=None, col=None)

LocalConcurrences match

path
class dtaidistance.subsequence.dtw.LocalConcurrences(series1, series2=None, gamma=1, tau=0, delta=0, delta_factor=1, only_triu=False, penalty=None, window=None)

Version identification based on local concurrences.

Find recurring patterns across two time series. Used to identify whether one time series is a version of another. If the two time series are the same one, it can be used to find typical or frequent patterns in a time series.

Based on 7.3.2 Identification Procedure in Fundamentals of Music Processing, Meinard Müller, Springer, 2015.

Different from the original formulation, D_tau is introduced based on the given delta factor. This makes the penalty less sensitive to the cumulative effect of the paths in the self-similarity matrix S:

S_tau(n,m) = S(n,m) if S(n,m) >= tau (with tau >= 0)
delta if S(n,m) < tau (with delta <= 0)

And for the accumulated score matrix D:

D_tau(n,m) = max(0,
df * D_tau(n−1,m−1) + S_tau(n,m), df * D_tau(n−1,m) + S_tau(n,m), df * D_tau(n,m−1) + S_tau(n,m))

where df = 1 if S(n,m) >= tau and df=delta_factor (<=1) otherwise,

Parameters:
  • series1 – First time series.
  • series2 – Second time series. If empty, series1 is used and compared with itself.
  • gamma – Affinity transformation exp(-gamma*(s1[i] - s2[j])**2), should be >0
  • tau – threshold parameter, should be >= 0
  • delta – penalty parameter, should be <= 0
  • delta_factor – penalty factor parameter, should be <= 1
  • only_triu – Only consider upper triangular matrix in warping paths.
align()
Returns:
best_match()
best_path(row, col)
estimate_settings_from_std(series, tau_std=0.33)
Parameters:
  • series
  • tau_std – Set tau to differences larger than tau_std time standard deviation of the given series (default is 0.33, or reject differences that are larger than the deviation wrt to the mean of 75% of the values in the series, assuming a normal distribution).
Returns:

kbest_matches(k=1, minlen=2, buffer=0)

Yields the next best match. Stops at k matches (use None for all matches).

Parameters:
  • k – Number of matches to yield. None is all matches.
  • minlen – Consider only matches of length longer than minlen
  • buffer – Matches cannot be closer than buffer to each other.
Returns:

Yield an LCMatch object

reset()
wp
class dtaidistance.subsequence.dtw.SAMatch(idx, alignment)

SubsequenceAlignment match

distance

DTW distance of match.

This value is dependent on the length of the query. Use the value property when comparing queries of different lengths.

path

Matched path in series

segment

Matched segment in series.

value

Normalized DTW distance of match.

Normalization is the DTW distance divided by the query length.

class dtaidistance.subsequence.dtw.SSMatch(kidx, ss)

Found match by SubsequenceSearch.

The match is identified by the idx property, which is the index of the matched series in the original list of series. The distance property returns the DTW distance between the query and the series at index idx.

distance

DTW distance.

idx
value

Normalized DTW distance.

class dtaidistance.subsequence.dtw.SSMatches(ss)
class dtaidistance.subsequence.dtw.SubsequenceAlignment(query, series, penalty=0.1, use_c=False)

Subsequence alignment using DTW. Find where the query occurs in the series.

Based on Fundamentals of Music Processing, Meinard Müller, Springer, 2015.

Example:

query = np.array([1., 2, 0])
series = np.array([1., 0, 1, 2, 1, 0, 2, 0, 3, 0, 0])
sa = subsequence_search(query, series)
mf = sa.matching_function()
sa.kbest_matches(k=2)
Parameters:
  • query – Subsequence to search for
  • series – Long sequence in which to search
  • penalty – Penalty for non-diagonal matching
  • use_c – Use the C-based DTW function if available
align()
align_fast()
best_match()
best_match_fast()
get_match(idx)
kbest_matches(k=1, overlap=0)

Yields the next best match. Stops at k matches (use None for all matches).

Parameters:
  • k – Number of matches to yield. None is all matches.
  • overlap – Matches cannot overlap unless overlap > 0.
Returns:

Yield an SAMatch object

kbest_matches_fast(k=1, overlap=0)
matching_function()

The matching score for each end-point of a possible match.

matching_function_bestpath(idx)

Indices in series for best path for match in matching function at idx.

Parameters:idx – Index in matching function
Returns:List of (row, col)
matching_function_endpoint(idx)

Index in series for end of match in matching function at idx.

Parameters:idx – Index in matching function
Returns:Index in series
matching_function_segment(idx)

Matched segment in series.

matching_function_startpoint(idx)

Index in series for start of match in matching function at idx.

Parameters:idx – Index in matching function
Returns:Index in series
reset()
warping_paths()

Get matrix with all warping paths.

If the aligmnent was computed using a compact, the paths are first copied into a full warping paths matrix.

Returns:Numpy matrix of size (len(query)+1) * (len(series)+1)
class dtaidistance.subsequence.dtw.SubsequenceSearch(query, s, dists_options=None, use_lb=True, keep_all_distances=False, max_dist=None, max_value=None, use_c=None, use_ndim=None)

Search the best matching (subsequence) time series compared to a given time series.

Parameters:
  • query – Time series to search for
  • s – Iterator over time series to perform search on. This can be for example windows over a long time series.
  • dists_options – Options passed on to dtw.distance
  • use_lb – Use lowerbounds to early abandon options
  • max_dist – Ignore DTW distances larger than this value if max_dist is also given in dists_options, then the one in dists_options is ignored if both max_dist and max_value are given, the smallest is used
  • max_value – Ignore normalized DTW distances larger than this value
align(k=None)
align_fast(k=None)
best_match()
best_match_fast()
get_ith_value(i)

Return the i-th value from the k-best values.

Parameters:i – Return i-th best value (i < k)
Returns:(distance, index)
kbest_matches(k=1)

Return the k best matches.

It is recommended to set k to a value, and not None. If k is set to None, all comparisons are kept and returned. Also no early stopping is applied in case k is None.

Parameters:k – Number of best matches to return (default is 1)
Returns:List of SSMatch objects
kbest_matches_fast(k=1)
reset()
dtaidistance.subsequence.dtw.local_concurrences(series1, series2=None, gamma=1, tau=0, delta=0, delta_factor=1, estimate_settings=None, only_triu=False, penalty=None, window=None)

Local concurrences, see LocalConcurrences.

Parameters:
  • series1
  • series2
  • gamma – Affinity transformation exp(-gamma*(s1[i] - s2[j])**2)
  • tau – threshold parameter
  • delta – penalty parameter Should be negative. Added instead of the affinity score (if score below tau threshold parameter).
  • delta_factor – multiply cumulative score (e.g. by 0.5). This is useful to have the same impact at different locations in the warping paths matrix, which is cumulative (and thus typically large in one corner and small in the opposite corner).
  • estimate_settings – Estimate tau, delta, delta_factor from given series. Will be passed as tau_std to estimate_settings_from_std.
  • only_triu – Only compute the upper traingle matrix values. Useful to avoid redundant computations when series1 is equal to series2 (or equivalently if series2 is None).
  • penalty – Penalty that is added when dynamic programming is using moving vertically or horizontally through the matrix instead of diagonally. Used to prefer diagonal paths.
Returns:

dtaidistance.subsequence.dtw.subsequence_alignment(query, series, use_c=False)

See SubsequenceAligment.

Parameters:
  • query
  • series
Returns:

See SubsequenceSearch.

Parameters:
  • query – Time series to search for
  • series – Iterator over time series to perform search on. This can be for example windows over a long time series.
  • dists_options – Options passed on to dtw.distance
  • use_lb – Use lowerbounds to early abandon options
  • max_dist – Ignore DTW distances larger than this value
  • max_value – Ignore normalized DTW distances larger than this value
  • use_c – Use fast C implementation if available
Returns:

SubsequenceSearch object