How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

Question

I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.

In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]

where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)

I want to do broadcasting, thus use something like DATA[COND[None,...]] But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space. I could loop over the vertical levels, but I think that would be slow. Is there a way of somehow indexing DATA as

DATA[cond[times],:,COND[ys],COND[xs]]?

Setting up some dummy arrays:

np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3))  # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points

giving:

COND
array([[[1, 1, 0],
        [1, 0, 0],
        [0, 1, 1]],

       [[1, 1, 1],
        [0, 0, 1],
        [0, 0, 0]]])

DATA
array([[[[26, 58, 92],
         [69, 80, 73],
         [47, 50, 76]],

        [[37, 34, 38],
         [67, 11,  0],
         [75, 80,  3]]],

giving:

   [[[ 2, 19, 12],
     [65, 75, 81],
     [14, 71, 60]],

    [[46, 28, 81],
     [87, 13, 96],
     [12, 69, 95]]]])

I can find the argument using argwhere:

idx=np.argwhere(COND==1)
array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 2, 1],
       [0, 2, 2],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 2],
       [1, 1, 2]])

Now I want to do something like

np.mean(DATA[idx[...,None,...]])

or

np.mean(DATA[idx[0],None,idx[1],idx[2])

which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1

This question is related to this: filtering a 3D numpy array according to 2D numpy array

but my klev index is in the middle and not the left or right, so I can't use the [...,None] solution

Answer 1

Using `zip` to get indices along each axis

IIUC, you have done most of the work, ie idx

>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
 (0, 0, 1, 2, 2, 0, 0, 0, 1),
 (0, 1, 0, 1, 2, 0, 1, 2, 2)]

>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]

array([[26, 37],
       [58, 34],
       [69, 67],
       [50, 80],
       [76,  3],
       [ 2, 46],
       [19, 28],
       [12, 81],
       [81, 96]])

>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])

Get indices using `np.where`

An easier way to get the numpy.where :

>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Get indices using np.nonzero

Or, numpy.nonzero , probably the most explicit:

>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Use the condition array directly

Notably, a handy trick while dealing with ndarray s is numpy.transpose , as you would have seen in the linked post, in your question, while indexing, dimensions are left aligned, but your array in the current form is not suitable for that kind of indexing, so if your aggregation dimension was at the very right, and index dimensions were to the left, that would do the trick.

So, if your data could be reordered:

Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3

It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1

It would have worked.

Reorder axes using `np.transpose`

You can use numpy.transpose for that:

>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])

Roll axes using `np.roll`

You could also roll your axis (==1) to the end (ie 4th dimension), using numpy.rollaxis :

>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])

Move axes using `np.transpose`

Or, you could move your axis from source dimension to destination dimension, ie move axis 1 to axis 3, using np.moveaxis :

>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444])

How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

Question

1 answers

solution1
1 ACCPTED 2021-01-27 14:51:42

Using `zip` to get indices along each axis

Get indices using `np.where`

Get indices using np.nonzero

Use the condition array directly

Reorder axes using `np.transpose`

Roll axes using `np.roll`

Move axes using `np.transpose`

How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

Question

1 answers

solution1 1 ACCPTED 2021-01-27 14:51:42

Using zip to get indices along each axis

Get indices using np.where

Get indices using np.nonzero

Use the condition array directly

Reorder axes using np.transpose

Roll axes using np.roll

Move axes using np.transpose

solution1
1 ACCPTED 2021-01-27 14:51:42

Using `zip` to get indices along each axis

Get indices using `np.where`

Reorder axes using `np.transpose`

Roll axes using `np.roll`

Move axes using `np.transpose`