[英]Finding similar sub-sequences in a time series?
我有數千個時間序列(24 維數據——一天中的每個小時都有 1 個維度)。 在這些時間序列中,我對如下所示的特定子序列或模式感興趣:
我對類似於突出顯示部分的整體形狀的子序列感興趣 - 即具有急劇負斜率的子序列,然后是幾個小時的時間段,其中斜率相對平坦,然后最終以陡峭的正斜率。 我知道我感興趣的子序列不會完全匹配,並且很可能會及時移動,縮放不同,有更長/更短的斜率相對平坦的周期等,但我想找到一種檢測它們的方法。
為此,我開發了一個簡單的啟發式(基於我對突出顯示部分的定義)來快速找到一些感興趣的子序列。 但是,我想知道是否有一種更優雅的方式(在 Python 中)來搜索我感興趣的子序列的數千個時間序列(同時考慮到上面提到的事情——時間、規模等方面的差異)。 )?
編輯:一年后,我不敢相信我使平線和斜率檢測變得多么復雜; 偶然發現同一個問題,我意識到這很簡單
idxs = np.where(x[1:] - x[:-1] == 0)
idxs = [i for idx in idxs for i in (idx, idx + 1)]
第一行通過np.diff(x)
有效實現; 此外,例如檢測斜率 > 5,請使用np.diff(x) > 5
。 第二行是因為差分拋出了正確的端點(例如diff([5,6,6,6,7]) = [1,0,0,1]
-> idxs=[1,2]
,不包括3,
。
下面的功能應該做; 用直觀的變量和方法名稱編寫的代碼,並且應該通過一些閱讀來不言自明。 該代碼高效且可擴展。
功能:
示例:
import numpy as np
import matplotlib.pyplot as plt
# Toy data
t = np.array([[ 5, 3, 3, 5, 3, 3, 3, 3, 3, 5, 5, 3, 3, 0, 4,
1, 1, -1, -1, 1, 1, 1, 1, -1, 1, 1, -1, 0, 3, 3,
5, 5, 3, 3, 3, 3, 3, 5, 7, 3, 3, 5]]).T
plt.plot(t)
plt.show()
# Get flatline indices
indices = get_flatline_indices(t, min_len=4, max_len=5)
plt.plot(t)
for idx in indices:
plt.plot(idx, t[idx], marker='o', color='r')
plt.show()
# Filter by edge slopes
lims_left = (-10, -2)
lims_right = (2, 10)
averaging_intervals = [1, 2, 3]
indices_filtered = filter_by_tail_slopes(indices, t, lims_left, lims_right,
averaging_intervals)
plt.plot(t)
for idx in indices_filtered:
plt.plot(idx, t[idx], marker='o', color='r')
plt.show()
def get_flatline_indices(sequence, min_len=2, max_len=6):
indices=[]
elem_idx = 0
max_elem_idx = len(sequence) - min_len
while elem_idx < max_elem_idx:
current_elem = sequence[elem_idx]
next_elem = sequence[elem_idx+1]
flatline_len = 0
if current_elem == next_elem:
while current_elem == next_elem:
flatline_len += 1
next_elem = sequence[elem_idx + flatline_len]
if flatline_len >= min_len:
if flatline_len > max_len:
flatline_len = max_len
trim_start = elem_idx
trim_end = trim_start + flatline_len
indices_to_append = [index for index in range(trim_start, trim_end)]
indices += indices_to_append
elem_idx += flatline_len
flatline_len = 0
else:
elem_idx += 1
return indices if not all([(entry == []) for entry in indices]) else []
def filter_by_tail_slopes(indices, data, lims_left, lims_right, averaging_intervals=1):
indices_filtered = []
indices_temp, tails_temp = [], []
got_left, got_right = False, False
for idx in indices:
slopes_left, slopes_right = _get_slopes(data, idx, averaging_intervals)
for tail_left, slope_left in enumerate(slopes_left):
if _valid_slope(slope_left, lims_left):
if got_left:
indices_temp = [] # discard prev if twice in a row
tails_temp = []
indices_temp.append(idx)
tails_temp.append(tail_left + 1)
got_left = True
if got_left:
for edge_right, slope_right in enumerate(slopes_right):
if _valid_slope(slope_right, lims_right):
if got_right:
indices_temp.pop(-1)
tails_temp.pop(-1)
indices_temp.append(idx)
tails_temp.append(edge_right + 1)
got_right = True
if got_left and got_right:
left_append = indices_temp[0] - tails_temp[0]
right_append = indices_temp[1] + tails_temp[1]
indices_filtered.append(_fill_range(left_append, right_append))
indices_temp = []
tails_temp = []
got_left, got_right = False, False
return indices_filtered
def _get_slopes(data, idx, averaging_intervals):
if type(averaging_intervals) == int:
averaging_intervals = [averaging_intervals]
slopes_left, slopes_right = [], []
for interval in averaging_intervals:
slopes_left += [(data[idx] - data[idx-interval]) / interval]
slopes_right += [(data[idx+interval] - data[idx]) / interval]
return slopes_left, slopes_right
def _valid_slope(slope, lims):
min_slope, max_slope = lims
return (slope >= min_slope) and (slope <= max_slope)
def _fill_range(_min, _max):
return [i for i in range(_min, _max + 1)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.