简体   繁体   English

用最后一个非零值填充一维 numpy 数组的零值

[英]Fill zero values of 1d numpy array with last non-zero values

Let's say we have a 1d numpy array filled with some int values.假设我们有一个填充了一些int值的一维 numpy 数组。 And let's say that some of them are 0 .假设其中一些是0

Is there any way, using numpy array's power, to fill all the 0 values with the last non-zero values found?有没有什么办法,使用numpy数组的力量,用找到的最后一个非零值填充所有0值?

for example:例如:

arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
fill_zeros_with_last(arr)
print arr

[1 1 1 2 2 4 6 8 8 8 8 8 2]

A way to do it would be with this function:一种方法是使用此功能:

def fill_zeros_with_last(arr):
    last_val = None # I don't really care about the initial value
    for i in range(arr.size):
        if arr[i]:
            last_val = arr[i]
        elif last_val is not None:
            arr[i] = last_val

However, this is using a raw python for loop instead of taking advantage of the numpy and scipy power.但是,这是使用原始 python for循环而不是利用numpyscipy功能。

If we knew that a reasonably small number of consecutive zeros are possible, we could use something based on numpy.roll .如果我们知道可能有相当少的连续零,我们可以使用基于numpy.roll东西。 The problem is that the number of consecutive zeros is potentially large...问题是连续零的数量可能很大......

Any ideas?有任何想法吗? or should we go straight to Cython ?还是我们应该直接去Cython

Disclaimer:免责声明:

I would say long ago I found a question in stackoverflow asking something like this or very similar.我会说很久以前我在 stackoverflow 中发现了一个类似这样或非常相似的问题。 I wasn't able to find it.我无法找到它。 :-( :-(

Maybe I missed the right search terms, sorry for the duplicate then.也许我错过了正确的搜索词,很抱歉重复。 Maybe it was just my imagination...或许只是我的错觉……

Here's a solution using np.maximum.accumulate :这是使用np.maximum.accumulate的解决方案:

def fill_zeros_with_last(arr):
    prev = np.arange(len(arr))
    prev[arr == 0] = 0
    prev = np.maximum.accumulate(prev)
    return arr[prev]

We construct an array prev which has the same length as arr , and such that prev[i] is the index of the last non-zero entry before the i -th entry of arr .我们构造了一个与arr长度相同的数组prev ,并且prev[i]arri个条目之前的最后一个非零条目的索引。 For example, if:例如,如果:

>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])

Then prev looks like:然后prev看起来像:

array([ 0,  0,  0,  3,  3,  5,  6,  7,  7,  7,  7,  7, 12])

Then we just index into arr with prev and we obtain our result.然后我们只用prev索引到arr ,我们就得到了我们的结果。 A test:一个测试:

>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])

Note : Be careful to understand what this does when the first entry of your array is zero:注意:当数组的第一个条目为零时,请注意理解它的作用:

>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])

Inspired by jme 's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:jme在这里的回答和Bas Swinkels 的启发(在链接的问题中),我想出了一种不同的 numpy 函数组合:

def fill_zeros_with_last(arr, initial=0):
     ind = np.nonzero(arr)[0]
     cnt = np.cumsum(np.array(arr, dtype=bool))
     return np.where(cnt, arr[ind[cnt-1]], initial)

I think it's succinct and also works, so I'm posting it here for the record.我认为它既简洁又有效,所以我将它张贴在这里以作记录。 Still, jme 's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)尽管如此, jme也简洁易懂,而且速度似乎更快,所以我接受了:-)

If the 0 s only come in strings of 1, this use of nonzero might work:如果0只出现在 1 的字符串中,则nonzero这种使用可能有效:

In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])

I can handle your arr by applying this repeatedly until I is empty.我可以通过反复应用这个来处理你的arr ,直到I是空的。

In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])

In [287]: while True:
   .....:     I=np.nonzero(arr==0)[0]
   .....:     if len(I)==0: break
   .....:     arr[I] = arr[I-1]
   .....:     

In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])

If the strings of 0s are long it might be better to look for those strings and handle them as a block.如果 0 的字符串很长,最好查找这些字符串并将它们作为块处理。 But if most strings are short, this repeated application may be the fastest route.但如果大多数字符串都很短,这种重复应用可能是最快的途径。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM