[英]How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array
I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.我想根据使用 3D 数组的索引对 4D numpy 数组中的条件选择元素进行平均。
In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]换句话说,我的 4D 数组DATA具有以下维度: [ntime,nz,ny,nx]
where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)其中我用来有条件地采样的 3D 数组COND只是 [ntime,ny,nx] 的 function (时间片的数量,x 和 y 点相同)
I want to do broadcasting, thus use something like DATA[COND[None,...]]
But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space.我想做广播,因此使用类似
DATA[COND[None,...]]
但问题是“缺失”的垂直维度不在右侧,而是在时间与 x 和 y 空间之间的中间. I could loop over the vertical levels, but I think that would be slow.我可以遍历垂直级别,但我认为这会很慢。 Is there a way of somehow indexing DATA as
有没有办法以某种方式将 DATA 索引为
DATA[cond[times],:,COND[ys],COND[xs]]?
Setting up some dummy arrays:设置一些虚拟 arrays:
np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3)) # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points
giving:给予:
COND
array([[[1, 1, 0],
[1, 0, 0],
[0, 1, 1]],
[[1, 1, 1],
[0, 0, 1],
[0, 0, 0]]])
DATA
array([[[[26, 58, 92],
[69, 80, 73],
[47, 50, 76]],
[[37, 34, 38],
[67, 11, 0],
[75, 80, 3]]],
giving:给予:
[[[ 2, 19, 12],
[65, 75, 81],
[14, 71, 60]],
[[46, 28, 81],
[87, 13, 96],
[12, 69, 95]]]])
I can find the argument using argwhere:我可以使用 argwhere 找到参数:
idx=np.argwhere(COND==1)
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 2, 1],
[0, 2, 2],
[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 1, 2]])
Now I want to do something like现在我想做类似的事情
np.mean(DATA[idx[...,None,...]])
or或者
np.mean(DATA[idx[0],None,idx[1],idx[2])
which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1这应该给我一个答案,其中 2 个数字对应于时间的平均 DATA 值,当 COND=1 时 x 和 y 点
This question is related to this: filtering a 3D numpy array according to 2D numpy array这个问题与此有关: 根据 2D numpy array filter a 3D numpy array
but my klev index is in the middle and not the left or right, so I can't use the [...,None]
solution但我的 klev 索引在中间而不是左或右,所以我不能使用
[...,None]
解决方案
zip
to get indices along each axiszip
获取沿每个轴的索引IIUC, you have done most of the work, ie idx
IIUC,你已经完成了大部分工作,即
idx
>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
(0, 0, 1, 2, 2, 0, 0, 0, 1),
(0, 1, 0, 1, 2, 0, 1, 2, 2)]
>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]
array([[26, 37],
[58, 34],
[69, 67],
[50, 80],
[76, 3],
[ 2, 46],
[19, 28],
[12, 81],
[81, 96]])
>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])
np.where
np.where
获取索引An easier way to get the numpy.where
:获取
numpy.where
的更简单方法:
>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))
Or, numpy.nonzero
, probably the most explicit:或者,
numpy.nonzero
,可能是最明确的:
>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))
Notably, a handy trick while dealing with ndarray
s is numpy.transpose
, as you would have seen in the linked post, in your question, while indexing, dimensions are left aligned, but your array in the current form is not suitable for that kind of indexing, so if your aggregation dimension was at the very right, and index dimensions were to the left, that would do the trick.值得注意的是,在处理
numpy.transpose
ndarray
正如您在链接的帖子中看到的那样,在您的问题中,在索引时,尺寸是左对齐的,但是您当前形式的数组不适合那种的索引,所以如果你的聚合维度在最右边,而索引维度在左边,那就可以了。
So, if your data could be reordered:因此,如果您的数据可以重新排序:
Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3
It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1
It would have worked.它会奏效的。
np.transpose
np.transpose
重新排序轴You can use numpy.transpose
for that:您可以为此使用
numpy.transpose
:
>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])
np.roll
np.roll
滚动轴You could also roll
your axis (==1) to the end (ie 4th dimension), using numpy.rollaxis
:您还可以使用
numpy.rollaxis
将轴 (==1) roll
到末端(即第 4 维):
>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])
np.transpose
np.transpose
移动轴Or, you could move
your axis from source
dimension to destination
dimension, ie move axis 1 to axis 3, using np.moveaxis
:或者,您可以使用
move
轴从source
维度移动到destination
维度, np.moveaxis
轴 1 移动到轴 3:
>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.