I'm looking for a way to rebin irregularly gridded data onto a regular grid, but without interpolation (so not eg matplotlib.mlab.griddata
. Preferably, I'd like to average or median the points within one cell, or even apply my own function.
The grid is 2D, but since I foresee future cases with different dimensions, an N-dimensional solution is even better.
As an example, consider the following data, with x
and y
coordinates:
data = np.arange(6)
x = np.array([0.4, 0.6, 0.8, 1.5, 1.8, 2.2])
y = np.array([0.4, 0.8, 2.3, 2.5, 2.7, 2.9])
which, when binned to a regular 3x3 grid and using average values, should result in:
[[ 0.5 nan 2. ]
[ nan nan 3.5]
[ nan nan 5. ]]
(NaN's are optional, but clearer than 0's, since the latter value can be an actual average; this is of course also easy to turn into a masked array.)
So far, I've been able to tackle the problem using Pandas:
xindices = np.digitize(x, np.arange(NX))
yindices = np.digitize(y, np.arange(NY))
df = pd.DataFrame({
'x': xindices,
'y': yindices,
'z': data
})
grouped = df.groupby(['y', 'x'])
result = grouped.aggregate(np.mean).reset_index()
grid = np.empty((NX, NY)) * np.NaN
grid[result['x']-1, result['y']-1] = result['z']
which allows me to pick any aggregating function I like.
However, since Pandas is rather general (Pandas doesn't care that x
and y
are grid indices), I feel that this may not be the optimal solution: having a solution that knows that the input and output are already on a (2D) grid seems more efficient. I have, however, not been able to find one; np.digitize
comes closest, but that's only 1 dimensional, and still requires a loop in Python to access the indices and average or median over the data.
Does anyone know a better solution that the above one?
You could use scipy.stats.binned_statistic_2d :
import numpy as np
import scipy.stats as stats
data = np.arange(6)
x = np.array([0.4, 0.6, 0.8, 1.5, 1.8, 2.2])
y = np.array([0.4, 0.8, 2.3, 2.5, 2.7, 2.9])
NX, NY = 4, 4
statistic, xedges, yedges, binnumber = stats.binned_statistic_2d(
x, y, values=data, statistic='mean',
bins=[np.arange(NX), np.arange(NY)])
print(statistic)
which yields
[[ 0.5 nan 2. ]
[ nan nan 3.5]
[ nan nan 5. ]]
There is also binned_statistic_dd
for higher dimensional binning. Each of these functions support user-defined statistics by passing a callable to the statistic
parameter.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.