简体   繁体   English

在计算二维 Numpy 数组的行移动平均值时处理 np.NaN

[英]Handling np.NaN When Calculating row-wise Moving Average of a 2D Numpy Array

I'm trying to obtain an array containing the moving averages along the rows of a 2-dimensional numpy array , based on a certain 'window' (ie the number of rows included in the average) and an 'offset'.我试图基于某个“窗口”(即平均值中包含的行数)和“偏移量”来获取一个包含沿二维 numpy 数组行移动平均值的数组 I've come up with the code below which I know is not efficient:我想出了下面我知道效率不高的代码:

import numpy as np
def f(array, window, offset):
    x = np.empty(array.shape)
    x[:,:] = np.NaN
    for row_num in range(array.shape[0]):
        first_row = row_num - window - offset
        last_row = row_num - offset + 1
        if first_row >= 0:
            x[row_num] = np.nanmean(array[first_row:last_row], axis=0)
    return x

I've found a potential solution here , adapted below for my code:我在这里找到了一个潜在的解决方案,适用于我的代码:

import math
from scipy.ndimage import uniform_filter
def g(array, window, offset):
    return uniform_filter(array, size=(window+1,1), mode='nearest', origin=(math.ceil((window+1)/2-1),0))

This solution, however, has 3 problems:然而,这个解决方案有 3 个问题:

  • First, I'm not sure how to implement the 'offset'首先,我不确定如何实现“偏移”
  • Second, I'm not sure whether it is indeed more efficient其次,我不确定它是否确实更有效率
  • Third, and most importantly, it doesn't work when the input array contains np.nan .第三,也是最重要的,当输入数组包含 np.nan 时它不起作用 The moment np.nan is found, it gets dragged down in the moving average, instead of following the np.nanmean behaviour.找到 np.nan 的那一刻,它会在移动平均线中被拖下,而不是遵循 np.nanmean 行为。

Is there an efficient way to achieve what I'm trying to get?有没有一种有效的方法来实现我想要的目标?

Update更新

As suggested by Ehsan, I've implemented the code below (with a small modification), which works as my original code for any offset above 0:正如 Ehsan 所建议的,我已经实现了下面的代码(稍作修改),它作为我的原始代码用于任何高于 0 的偏移量:

from skimage.util import view_as_windows
def h(array, window, offset):
    return np.vstack(([[np.NaN]*array.shape[-1]]*(window+offset),np.vstack(np.nanmean(view_as_windows(array,(window+1,array.shape[-1])),-2)[:-offset])))

I'm just not sure how to make it work for any offset (in particular, offset=0).我只是不确定如何使它适用于任何偏移量(特别是偏移量 = 0)。 Also, this solution seems to consume more time than the original one:此外,此解决方案似乎比原始解决方案消耗更多时间:

a = np.arange(10*11).reshape(10,11)

%timeit f(a, 5, 2)
%timeit h(a, 5, 2)
>>> 36.6 µs ± 709 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> 67.5 µs ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I was wondering if there's any alternative which is less time consuming我想知道是否有其他更省时的替代方案

This will provide you the same output as your code, but I think you might want to rethink the extra +1 in last_row definition, since it skips the last row and your actual window size would be window+1:这将为您提供与您的代码相同的输出,但我认为您可能需要重新考虑last_row定义中的额外+1 ,因为它跳过最后一行并且您的实际窗口大小将为 window+1:

from skimage.util import view_as_windows
def f(array, window, offset):
    return np.vstack(([[np.NaN]*array.shape[-1]]*(window+offset),np.vstack(np.nanmean(view_as_windows(array,(window+1,array.shape[-1])),-2)[:array.shape[0]-window-offset])))

sample output:示例输出:

a = np.arange(7*6).reshape(7,6)
f(a, 2, 1)
#[[nan nan nan nan nan nan]
# [nan nan nan nan nan nan]
# [nan nan nan nan nan nan]
# [ 6.  7.  8.  9. 10. 11.]
# [12. 13. 14. 15. 16. 17.]
# [18. 19. 20. 21. 22. 23.]
# [24. 25. 26. 27. 28. 29.]]

Comparison using benchit :使用benchit比较

#@OP's solution
def f1(array, window, offset):
    x = np.empty(array.shape)
    x[:,:] = np.NaN
    for row_num in range(array.shape[0]):
        first_row = row_num - window - offset
        last_row = row_num - offset + 1
        if first_row >= 0:
            x[row_num] = np.nanmean(array[first_row:last_row], axis=0)
    return x
#@Ehsan's solution
def f2(array, window, offset):
    return np.vstack(([[np.NaN]*array.shape[-1]]*(window+offset),np.vstack(np.nanmean(view_as_windows(array,(window+1,array.shape[-1])),-2)[:array.shape[0]-window-offset])))

in_ = {n:[np.arange(n*10).reshape(n,10), 2,2] for n in [10,100,500,1000,4000]}

The proposed solution f2 is significantly faster.建议的解决方案f2明显更快。 You have to note that most vectorized solutions are efficient on larger arrays.您必须注意,大多数矢量化解决方案在较大的阵列上是有效的。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM