简体   繁体   English

python中的时间序列分割

[英]time-series segmentation in python

I am trying to segment the time-series data as shown in the figure.我正在尝试对时间序列数据进行分段,如图所示。 I have lots of data from the sensors, any of these data can have different number of isolated peaks region.我有很多来自传感器的数据,这些数据中的任何一个都可以有不同数量的孤立峰区域。 In this figure, I have 3 of those.在这个图中,我有 3 个。 I would like to have a function that takes the time-series as the input and returns the segmented sections of equal length.我想要一个函数,它将时间序列作为输入并返回等长的分段部分。

My initial thought was to have a sliding window that calculates the relative change in the amplitude.我最初的想法是有一个滑动窗口来计算振幅的相对变化。 Since the window with the peaks will have relatively higher changes, I could just define certain threshold for the relative change that would help me take the window with isolated peaks.由于具有峰值的窗口将具有相对较高的变化,因此我可以为相对变化定义某个阈值,这将有助于我采用具有孤立峰值的窗口。 However, this will create problem when choosing the threshold as the relative change is very sensitive to the noises in the data.然而,这在选择阈值时会产生问题,因为相对变化对数据中的噪声非常敏感。

Any suggestions?有什么建议?

图片 : 所需的时间序列数据分段带轴的图

To do this you need to find signal out of noise.为此,您需要从噪声中找出信号。

  1. get mean value of you signal and add some multiplayer that place borders on top and on bottom of noise - green dashed line获取信号的平均值并添加一些多人游戏,在噪音的顶部和底部放置边界 - 绿色虚线
  2. find peak values below bottom of noise -> array 2 groups of data找到低于噪声底部的峰值 -> 数组 2 组数据
  3. find peak values on top of noise -> array 2 groups of data在噪声之上找到峰值 -> 数组 2 组数据
  4. get min index of bottom first peak and max index of top of first peak to find first peak range获取底部第一个峰值的最小索引和第一个峰值顶部的最大索引以找到第一个峰值范围
  5. get min index of top second peak and max index of bottom of second peak to find second peak range获取顶部第二个峰值的最小索引和第二个峰值底部的最大索引以找到第二个峰值范围

Some description in code.代码中的一些描述。 With this method you can find other peaks.使用此方法,您可以找到其他峰。 One thing that you need to input by hand is to tell program the x value between peaks for splitting data into parts.您需要手动输入的一件事是告诉程序峰值之间的x值,以便将数据分成几部分。

See graphic for summary.见图表摘要。

import numpy as np
from matplotlib import pyplot as plt


# create noise data
def function(x, noise):
    y = np.sin(7*x+2) + noise
    return y

def function2(x, noise):
    y = np.sin(6*x+2) + noise
    return y


noise = np.random.uniform(low=-0.3, high=0.3, size=(100,))
x_line0 = np.linspace(1.95,2.85,100)
y_line0 = function(x_line0, noise)
x_line = np.linspace(0, 1.95, 100)
x_line2 = np.linspace(2.85, 3.95, 100)
x_pik = np.linspace(3.95, 5, 100)
y_pik = function2(x_pik, noise)
x_line3 = np.linspace(5, 6, 100)

# concatenate noise data
x = np.linspace(0, 6, 500)
y = np.concatenate((noise, y_line0, noise, y_pik, noise), axis=0)

# plot data
noise_band = 1.1
top_noise = y.mean()+noise_band*np.amax(noise)
bottom_noise = y.mean()-noise_band*np.amax(noise)
fig, ax = plt.subplots()
ax.axhline(y=y.mean(), color='red', linestyle='--')
ax.axhline(y=top_noise, linestyle='--', color='green')
ax.axhline(y=bottom_noise, linestyle='--', color='green')
ax.plot(x, y)

# split data into 2 signals
def split(arr, cond):
  return [arr[cond], arr[~cond]]

# find bottom noise data indexes
botom_data_indexes = np.argwhere(y < bottom_noise)
# split by visual x value
splitted_bottom_data = split(botom_data_indexes, botom_data_indexes < np.argmax(x > 3))

# find top noise data indexes
top_data_indexes = np.argwhere(y > top_noise)
# split by visual x value
splitted_top_data = split(top_data_indexes, top_data_indexes < np.argmax(x > 3))

# get first signal range
first_signal_start = np.amin(splitted_bottom_data[0])
first_signal_end = np.amax(splitted_top_data[0])

# get x index of first signal
x_first_signal = np.take(x, [first_signal_start, first_signal_end])
ax.axvline(x=x_first_signal[0], color='orange')
ax.axvline(x=x_first_signal[1], color='orange')

# get second signal range
second_signal_start = np.amin(splitted_top_data[1])
second_signal_end = np.amax(splitted_bottom_data[1])

# get x index of first signal
x_second_signal = np.take(x, [second_signal_start, second_signal_end])
ax.axvline(x=x_second_signal[0], color='orange')
ax.axvline(x=x_second_signal[1], color='orange')

plt.show()

Output:输出:

red line = mean value of all data红线 = 所有数据的平均值

green line - top and bottom noise borders绿线 - 顶部和底部噪声边界

orange line - selected peak data橙色线 - 选定的峰值数据

在此处输入图片说明

1, It depends on how you want to define a "region", but looks like you just have feeling instead of strict definition. 1,这取决于你想如何定义一个“区域”,但看起来你只是感觉而不是严格的定义。 If you have a very clear definition of what kind of piece you want to cut out, you can try some method like "matched filter"如果你对要剪出什么样的片有很清楚的定义,你可以尝试一些方法,比如“匹配过滤器”

2, You might want to detect the peak of absolute magnitude. 2,您可能想要检测绝对幅度的峰值。 If not working, try peak of absolute magnitude of first-order difference, even 2nd-order.如果不起作用,请尝试一阶差分绝对幅度的峰值,甚至二阶。

3, it is hard to work on the noisy data like this. 3,很难处理这样的嘈杂数据。 My suggestion is to do filtering before you pick up sections (on unfiltered data).我的建议是在选择部分之前进行过滤(在未过滤的数据上)。 Filtering will give you smooth peaks so that the position of peaks can be detected by the change of derivative sign.过滤将为您提供平滑的峰值,以便可以通过微分符号的变化来检测峰值的位置。 For filtering, try just "low-pass filter" first.对于过滤,请先尝试“低通滤波器”。 If it doesn't work, I also suggest "Hilbert–Huang transform".如果它不起作用,我还建议“希尔伯特-黄变换”。

*, Looks like you are using matlab. *, 看起来您正在使用 matlab。 The methods mentioned are all included in matlab.提到的方法都包含在matlab中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM