在時間序列中找到相似的子序列？

Question

我有數千個時間序列（24 維數據——一天中的每個小時都有 1 個維度）。 在這些時間序列中，我對如下所示的特定子序列或模式感興趣：

我對類似於突出顯示部分的整體形狀的子序列感興趣 - 即具有急劇負斜率的子序列，然后是幾個小時的時間段，其中斜率相對平坦，然后最終以陡峭的正斜率。 我知道我感興趣的子序列不會完全匹配，並且很可能會及時移動，縮放不同，有更長/更短的斜率相對平坦的周期等，但我想找到一種檢測它們的方法。

為此，我開發了一個簡單的啟發式（基於我對突出顯示部分的定義）來快速找到一些感興趣的子序列。 但是，我想知道是否有一種更優雅的方式（在 Python 中）來搜索我感興趣的子序列的數千個時間序列（同時考慮到上面提到的事情——時間、規模等方面的差異）。 )?

Answer 1

編輯：一年后，我不敢相信我使平線和斜率檢測變得多么復雜； 偶然發現同一個問題，我意識到這很簡單

idxs = np.where(x[1:] - x[:-1] == 0)
idxs = [i for idx in idxs for i in (idx, idx + 1)]

第一行通過np.diff(x)有效實現； 此外，例如檢測斜率 > 5，請使用np.diff(x) > 5 。 第二行是因為差分拋出了正確的端點（例如diff([5,6,6,6,7]) = [1,0,0,1] -> idxs=[1,2] ，不包括3, 。

下面的功能應該做； 用直觀的變量和方法名稱編寫的代碼，並且應該通過一些閱讀來不言自明。 該代碼高效且可擴展。

功能：

指定最小和最大扁平線長度
指定左右尾部的最小和最大斜率
在多個間隔內指定左右尾部的最小和最大平均斜率

示例：

import numpy as np
import matplotlib.pyplot as plt

# Toy data
t = np.array([[ 5,  3,  3,  5,  3,  3,  3,  3,  3,  5,  5,  3,  3,  0,  4,  
                1,  1, -1, -1,  1,  1,  1,  1, -1,  1,  1, -1,  0,  3,  3,  
                5,  5,  3,  3,  3,  3,  3,  5,  7,  3,  3,  5]]).T
plt.plot(t)
plt.show()

# Get flatline indices
indices = get_flatline_indices(t, min_len=4, max_len=5)
plt.plot(t)
for idx in indices:
    plt.plot(idx, t[idx], marker='o', color='r')
plt.show()

# Filter by edge slopes
lims_left  = (-10, -2)
lims_right = (2,  10)
averaging_intervals = [1, 2, 3]
indices_filtered = filter_by_tail_slopes(indices, t, lims_left, lims_right,
                                         averaging_intervals)
plt.plot(t)
for idx in indices_filtered:
    plt.plot(idx, t[idx], marker='o', color='r')
plt.show()

def get_flatline_indices(sequence, min_len=2, max_len=6):
    indices=[]
    elem_idx = 0
    max_elem_idx = len(sequence) - min_len
        
    while elem_idx < max_elem_idx:
        current_elem = sequence[elem_idx]
        next_elem    = sequence[elem_idx+1]
        flatline_len = 0

        if current_elem == next_elem:
            while current_elem == next_elem:
                flatline_len += 1
                next_elem = sequence[elem_idx + flatline_len]
                
            if flatline_len >= min_len:
                if flatline_len > max_len:
                    flatline_len = max_len
    
                trim_start = elem_idx
                trim_end   = trim_start + flatline_len
                indices_to_append = [index for index in range(trim_start, trim_end)]
                indices += indices_to_append

            elem_idx += flatline_len
            flatline_len = 0
        else:
            elem_idx += 1
    return indices if not all([(entry == []) for entry in indices]) else []

def filter_by_tail_slopes(indices, data, lims_left, lims_right, averaging_intervals=1):
    indices_filtered = []
    indices_temp, tails_temp = [], []
    got_left, got_right = False, False
    
    for idx in indices:
        slopes_left, slopes_right = _get_slopes(data, idx, averaging_intervals)
        
        for tail_left, slope_left in enumerate(slopes_left):
            if _valid_slope(slope_left, lims_left):
                if got_left:
                    indices_temp = []  # discard prev if twice in a row
                    tails_temp = []
                indices_temp.append(idx)
                tails_temp.append(tail_left + 1)
                got_left = True
        if got_left:
            for edge_right, slope_right in enumerate(slopes_right):
                if _valid_slope(slope_right, lims_right):
                    if got_right:
                        indices_temp.pop(-1)
                        tails_temp.pop(-1)
                    indices_temp.append(idx)
                    tails_temp.append(edge_right + 1)
                    got_right = True

        if got_left and got_right:
            left_append  = indices_temp[0] - tails_temp[0]
            right_append = indices_temp[1] + tails_temp[1]
            indices_filtered.append(_fill_range(left_append, right_append))
            indices_temp = []
            tails_temp = []
            got_left, got_right = False, False
    return indices_filtered

def _get_slopes(data, idx, averaging_intervals):
    if type(averaging_intervals) == int:
        averaging_intervals = [averaging_intervals]

    slopes_left, slopes_right = [], []
    for interval in averaging_intervals:
        slopes_left  += [(data[idx] - data[idx-interval]) / interval]
        slopes_right += [(data[idx+interval] - data[idx]) / interval]
    return slopes_left, slopes_right

def _valid_slope(slope, lims):
    min_slope, max_slope = lims
    return (slope  >= min_slope) and (slope <= max_slope)

def _fill_range(_min, _max):
    return [i for i in range(_min, _max + 1)]

在時間序列中找到相似的子序列？

問題描述

1 個解決方案

解決方案1
0 已采納 2019-10-15 04:03:12

在時間序列中找到相似的子序列？

問題描述

1 個解決方案

解決方案1 0 已采納 2019-10-15 04:03:12

解決方案1
0 已采納 2019-10-15 04:03:12