简体   繁体   English

numpy:根据值的顺序将数组拆分为多个部分

[英]Numpy: split array into parts according to sequence of values

What i have is a big numpy one-dimensional np.int16 array with data and one boolean array, which stores information whether a particular sample (wich is samplesize long) of data fits some criteria (is valid) or don't fits (is not valid). 我所是一个大numpy的一维np.int16阵列数据和一个布尔阵列,其存储信息的特定样品是否(至极是samplesize长)数据的符合某些标准(有效)或不配合(是无效)。 I mean i have something like this: 我的意思是我有这样的事情:

samplesize = 5
data = array([1, 2, 3, 4, 5, 3, 2, 1, 3, 2, 4, 5, 2, 1, 1], dtype=int16) 
membership = array([False, True, False], dtype=bool)

Here membership[0] identifies whether data[ 0*samplesize : 1*samplesize ] is valid. 在这里, membership[0]标识data[ 0*samplesize : 1*samplesize ]是否有效。

What i want is to split data array into chunks according to sequence of True values in membership array. 我想要的是根据隶属关系数组中True值的序列将数据数组拆分为多个块。 For example, if membership contains three or more successive True statement then the decision is made, that it is meaningful sample of data . 例如,如果membership包含三个或三个以上连续的True语句,则将做出决定,即它是有意义的data样本。

Example

True, True, True , True - valid sequence 
True, True, False, True , True - invalid sequece

Assuming we have identified start of i -th valid sequence as start[i] and end of such a sequence as end[i] , i want to split an data array into pieces which start from start[i] * samplesize and last to end[i] * samplesize . 假设我们已将第i个有效序列的start[i]标识为start[i]并将该序列的end[i]标识为end[i] ,我想将一个data数组拆分为多个段,这些段从start[i] * samplesize ,最后到end[i] * samplesize

How could i accomplish this ? 我怎么能做到这一点?

I don't understand your question. 我不明白你的问题。 Do you want to get start & end index of membership with 3 or more successive True? 您是否要获得连续3个或更多True的membership开始和结束索引?

Here is the code to do that, the basic idea is to diff(membership) , and get the index of rising edge and falling edge: 这是执行此操作的代码,其基本思想是diff(membership) ,并获得上升沿和下降沿的索引:

import numpy as np
membership = np.random.randint(0, 2, 100)
d = np.diff(np.r_[0, membership, 0])
start = np.where(d == 1)[0]
end = np.where(d == -1)[0]
mask = (end - start) >= 3
start = start[mask]
end = end[mask]

for s, e in zip(start, end):
    print s, e, membership[s:e]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM