简体   繁体   中英

How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.

In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]

where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)

I want to do broadcasting, thus use something like DATA[COND[None,...]] But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space. I could loop over the vertical levels, but I think that would be slow. Is there a way of somehow indexing DATA as

DATA[cond[times],:,COND[ys],COND[xs]]?

Setting up some dummy arrays:

np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3))  # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points

giving:

COND
array([[[1, 1, 0],
        [1, 0, 0],
        [0, 1, 1]],

       [[1, 1, 1],
        [0, 0, 1],
        [0, 0, 0]]])

DATA
array([[[[26, 58, 92],
         [69, 80, 73],
         [47, 50, 76]],

        [[37, 34, 38],
         [67, 11,  0],
         [75, 80,  3]]],

giving:

   [[[ 2, 19, 12],
     [65, 75, 81],
     [14, 71, 60]],

    [[46, 28, 81],
     [87, 13, 96],
     [12, 69, 95]]]])

I can find the argument using argwhere:

idx=np.argwhere(COND==1)
array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 2, 1],
       [0, 2, 2],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 2],
       [1, 1, 2]])

Now I want to do something like

np.mean(DATA[idx[...,None,...]])

or

np.mean(DATA[idx[0],None,idx[1],idx[2])

which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1

This question is related to this: filtering a 3D numpy array according to 2D numpy array

but my klev index is in the middle and not the left or right, so I can't use the [...,None] solution

Using zip to get indices along each axis

IIUC, you have done most of the work, ie idx

>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
 (0, 0, 1, 2, 2, 0, 0, 0, 1),
 (0, 1, 0, 1, 2, 0, 1, 2, 2)]

>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]

array([[26, 37],
       [58, 34],
       [69, 67],
       [50, 80],
       [76,  3],
       [ 2, 46],
       [19, 28],
       [12, 81],
       [81, 96]])

>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])

Get indices using np.where

An easier way to get the numpy.where :

>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Get indices using np.nonzero

Or, numpy.nonzero , probably the most explicit:

>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Use the condition array directly

Notably, a handy trick while dealing with ndarray s is numpy.transpose , as you would have seen in the linked post, in your question, while indexing, dimensions are left aligned, but your array in the current form is not suitable for that kind of indexing, so if your aggregation dimension was at the very right, and index dimensions were to the left, that would do the trick.

So, if your data could be reordered:

Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3

It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1

It would have worked.

Reorder axes using np.transpose

You can use numpy.transpose for that:

>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])

Roll axes using np.roll

You could also roll your axis (==1) to the end (ie 4th dimension), using numpy.rollaxis :

>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])

Move axes using np.transpose

Or, you could move your axis from source dimension to destination dimension, ie move axis 1 to axis 3, using np.moveaxis :

>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444]) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM