识别连续重复的子序列数组

Question

假设我遇到一个问题，我的序列应如下所示：

>>> np.repeat([1,2,3,4],6)
array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4,
       4, 4])

但实际上，由于噪声，传感器损坏或其他原因，它看起来更像这样：

array([6, 1, 1, 6, 1, 2, 2, 4, 2, 2, 2, 3, 3, 3, 3, 3, 8, 4, 4, 6, 4, 4])

值已丢失或注册错误。

现在我要：

查找连续序列的数量及其标识符（通过这个，我的意思是，例如1,1,1,1,1,1是长度为6的序列，所有值均为1 ）。
所有连续的序列都应具有相同的长度，但是由于噪声和数据损坏，该序列可能不会出现在数据中。 我也想找到连续序列的长度。
最后，在较高的层次上，我想知道传递给函数的序列是否具有这种结构（重复连续序列）-基本上是某种测试，根据序列的性质返回True或False 。

Answer 1

您可以使用medfilt从scipy signal模块

from scipy import signal
import numpy as np
import matplotlib.pyplot as plt

org = np.array([6, 1, 1, 6, 1, 2, 2, 4, 2, 2, 2, 3, 3, 3, 3, 3, 8, 4, 4, 6, 4, 4])

filt = signal.medfilt(org)

plt.plot(range(len(org)), org, label='orgiginal')
plt.plot(range(len(filt)), filt, label='filtered')
plt.legend()

print(filt)

sub_arrays = np.split(filt, np.where(np.diff(filt))[0]+1)
print(sub_arrays)
number_contiguous_sequences = len(sub_arrays)

for array in sub_arrays:
    print(len(array)) # gives 4, 7, 5, 6

FILT：

[1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 3. 3. 3. 3. 3. 4. 4. 4. 4. 4. 4.]

sub_arrays：

[array([1., 1., 1., 1.]), array([2., 2., 2., 2., 2., 2., 2.]), array([3., 3., 3., 3., 3.]), array([4., 4., 4., 4., 4., 4.])]

识别连续重复的子序列数组

问题描述

1 个解决方案

解决方案1
0 2019-01-23 15:33:07

识别连续重复的子序列数组

问题描述

1 个解决方案

解决方案1 0 2019-01-23 15:33:07

解决方案1
0 2019-01-23 15:33:07