简体   繁体   中英

Rebin irregularly gridded data to regular (2D) grid in Python, using mean/median

I'm looking for a way to rebin irregularly gridded data onto a regular grid, but without interpolation (so not eg matplotlib.mlab.griddata . Preferably, I'd like to average or median the points within one cell, or even apply my own function.

The grid is 2D, but since I foresee future cases with different dimensions, an N-dimensional solution is even better.

As an example, consider the following data, with x and y coordinates:

data = np.arange(6)
x = np.array([0.4, 0.6, 0.8, 1.5, 1.8, 2.2])
y = np.array([0.4, 0.8, 2.3, 2.5, 2.7, 2.9])

which, when binned to a regular 3x3 grid and using average values, should result in:

[[ 0.5  nan  2. ]
 [ nan  nan  3.5]
 [ nan  nan  5. ]]

(NaN's are optional, but clearer than 0's, since the latter value can be an actual average; this is of course also easy to turn into a masked array.)

So far, I've been able to tackle the problem using Pandas:

xindices = np.digitize(x, np.arange(NX))
yindices = np.digitize(y, np.arange(NY))
df = pd.DataFrame({
    'x': xindices,
    'y': yindices,
    'z': data
})
grouped = df.groupby(['y', 'x'])
result = grouped.aggregate(np.mean).reset_index()
grid = np.empty((NX, NY)) * np.NaN
grid[result['x']-1, result['y']-1] = result['z']

which allows me to pick any aggregating function I like.

However, since Pandas is rather general (Pandas doesn't care that x and y are grid indices), I feel that this may not be the optimal solution: having a solution that knows that the input and output are already on a (2D) grid seems more efficient. I have, however, not been able to find one; np.digitize comes closest, but that's only 1 dimensional, and still requires a loop in Python to access the indices and average or median over the data.

Does anyone know a better solution that the above one?

You could use scipy.stats.binned_statistic_2d :

import numpy as np
import scipy.stats as stats

data = np.arange(6)
x = np.array([0.4, 0.6, 0.8, 1.5, 1.8, 2.2])
y = np.array([0.4, 0.8, 2.3, 2.5, 2.7, 2.9])

NX, NY = 4, 4
statistic, xedges, yedges, binnumber = stats.binned_statistic_2d(
    x, y, values=data, statistic='mean', 
    bins=[np.arange(NX), np.arange(NY)])
print(statistic)

which yields

[[ 0.5  nan  2. ]
 [ nan  nan  3.5]
 [ nan  nan  5. ]]

There is also binned_statistic_dd for higher dimensional binning. Each of these functions support user-defined statistics by passing a callable to the statistic parameter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM