I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.
In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]
where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)
I want to do broadcasting, thus use something like DATA[COND[None,...]]
But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space. I could loop over the vertical levels, but I think that would be slow. Is there a way of somehow indexing DATA as
DATA[cond[times],:,COND[ys],COND[xs]]?
Setting up some dummy arrays:
np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3)) # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points
giving:
COND
array([[[1, 1, 0],
[1, 0, 0],
[0, 1, 1]],
[[1, 1, 1],
[0, 0, 1],
[0, 0, 0]]])
DATA
array([[[[26, 58, 92],
[69, 80, 73],
[47, 50, 76]],
[[37, 34, 38],
[67, 11, 0],
[75, 80, 3]]],
giving:
[[[ 2, 19, 12],
[65, 75, 81],
[14, 71, 60]],
[[46, 28, 81],
[87, 13, 96],
[12, 69, 95]]]])
I can find the argument using argwhere:
idx=np.argwhere(COND==1)
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 2, 1],
[0, 2, 2],
[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 1, 2]])
Now I want to do something like
np.mean(DATA[idx[...,None,...]])
or
np.mean(DATA[idx[0],None,idx[1],idx[2])
which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1
This question is related to this: filtering a 3D numpy array according to 2D numpy array
but my klev index is in the middle and not the left or right, so I can't use the [...,None]
solution
zip
to get indices along each axisIIUC, you have done most of the work, ie idx
>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
(0, 0, 1, 2, 2, 0, 0, 0, 1),
(0, 1, 0, 1, 2, 0, 1, 2, 2)]
>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]
array([[26, 37],
[58, 34],
[69, 67],
[50, 80],
[76, 3],
[ 2, 46],
[19, 28],
[12, 81],
[81, 96]])
>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])
np.where
An easier way to get the numpy.where
:
>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))
Or, numpy.nonzero
, probably the most explicit:
>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))
Notably, a handy trick while dealing with ndarray
s is numpy.transpose
, as you would have seen in the linked post, in your question, while indexing, dimensions are left aligned, but your array in the current form is not suitable for that kind of indexing, so if your aggregation dimension was at the very right, and index dimensions were to the left, that would do the trick.
So, if your data could be reordered:
Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3
It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1
It would have worked.
np.transpose
You can use numpy.transpose
for that:
>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])
np.roll
You could also roll
your axis (==1) to the end (ie 4th dimension), using numpy.rollaxis
:
>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])
np.transpose
Or, you could move
your axis from source
dimension to destination
dimension, ie move axis 1 to axis 3, using np.moveaxis
:
>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.