简体   繁体   English

如何根据 3D 数组中的索引对 4D 数组中的有条件选择的 numpy 数组条目进行平均

[英]How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.我想根据使用 3D 数组的索引对 4D numpy 数组中的条件选择元素进行平均。

In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]换句话说,我的 4D 数组DATA具有以下维度: [ntime,nz,ny,nx]

where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)其中我用来有条件地采样的 3D 数组COND只是 [ntime,ny,nx] 的 function (时间片的数量,x 和 y 点相同)

I want to do broadcasting, thus use something like DATA[COND[None,...]] But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space.我想做广播,因此使用类似DATA[COND[None,...]]但问题是“缺失”的垂直维度不在右侧,而是在时间与 x 和 y 空间之间的中间. I could loop over the vertical levels, but I think that would be slow.我可以遍历垂直级别,但我认为这会很慢。 Is there a way of somehow indexing DATA as有没有办法以某种方式将 DATA 索引为

DATA[cond[times],:,COND[ys],COND[xs]]?

Setting up some dummy arrays:设置一些虚拟 arrays:

np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3))  # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points

giving:给予:

COND
array([[[1, 1, 0],
        [1, 0, 0],
        [0, 1, 1]],

       [[1, 1, 1],
        [0, 0, 1],
        [0, 0, 0]]])

DATA
array([[[[26, 58, 92],
         [69, 80, 73],
         [47, 50, 76]],

        [[37, 34, 38],
         [67, 11,  0],
         [75, 80,  3]]],

giving:给予:

   [[[ 2, 19, 12],
     [65, 75, 81],
     [14, 71, 60]],

    [[46, 28, 81],
     [87, 13, 96],
     [12, 69, 95]]]])

I can find the argument using argwhere:我可以使用 argwhere 找到参数:

idx=np.argwhere(COND==1)
array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 2, 1],
       [0, 2, 2],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 2],
       [1, 1, 2]])

Now I want to do something like现在我想做类似的事情

np.mean(DATA[idx[...,None,...]])

or或者

np.mean(DATA[idx[0],None,idx[1],idx[2])

which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1这应该给我一个答案,其中 2 个数字对应于时间的平均 DATA 值,当 COND=1 时 x 和 y 点

This question is related to this: filtering a 3D numpy array according to 2D numpy array这个问题与此有关: 根据 2D numpy array filter a 3D numpy array

but my klev index is in the middle and not the left or right, so I can't use the [...,None] solution但我的 klev 索引在中间而不是左或右,所以我不能使用[...,None]解决方案

Using zip to get indices along each axis使用zip获取沿每个轴的索引

IIUC, you have done most of the work, ie idx IIUC,你已经完成了大部分工作,即idx

>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
 (0, 0, 1, 2, 2, 0, 0, 0, 1),
 (0, 1, 0, 1, 2, 0, 1, 2, 2)]

>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]

array([[26, 37],
       [58, 34],
       [69, 67],
       [50, 80],
       [76,  3],
       [ 2, 46],
       [19, 28],
       [12, 81],
       [81, 96]])

>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])

Get indices using np.where使用np.where获取索引

An easier way to get the numpy.where :获取numpy.where的更简单方法:

>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Get indices using np.nonzero使用 np.nonzero 获取索引

Or, numpy.nonzero , probably the most explicit:或者, numpy.nonzero ,可能是最明确的:

>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Use the condition array directly直接使用条件数组

Notably, a handy trick while dealing with ndarray s is numpy.transpose , as you would have seen in the linked post, in your question, while indexing, dimensions are left aligned, but your array in the current form is not suitable for that kind of indexing, so if your aggregation dimension was at the very right, and index dimensions were to the left, that would do the trick.值得注意的是,在处理numpy.transpose ndarray正如您在链接的帖子中看到的那样,在您的问题中,在索引时,尺寸是左对齐的,但是您当前形式的数组不适合那种的索引,所以如果你的聚合维度在最右边,而索引维度在左边,那就可以了。

So, if your data could be reordered:因此,如果您的数据可以重新排序:

Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3

It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1

It would have worked.它会奏效的。

Reorder axes using np.transpose使用np.transpose重新排序轴

You can use numpy.transpose for that:您可以为此使用numpy.transpose

>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])

Roll axes using np.roll使用np.roll滚动轴

You could also roll your axis (==1) to the end (ie 4th dimension), using numpy.rollaxis :您还可以使用numpy.rollaxis将轴 (==1) roll到末端(即第 4 维):

>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])

Move axes using np.transpose使用np.transpose移动轴

Or, you could move your axis from source dimension to destination dimension, ie move axis 1 to axis 3, using np.moveaxis :或者,您可以使用move轴从source维度移动到destination维度, np.moveaxis轴 1 移动到轴 3:

>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444]) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM