简体   繁体   English

使用均值/中位数将不规则网格化的数据重新绑定到Python中的常规(2D)网格

[英]Rebin irregularly gridded data to regular (2D) grid in Python, using mean/median

I'm looking for a way to rebin irregularly gridded data onto a regular grid, but without interpolation (so not eg matplotlib.mlab.griddata . Preferably, I'd like to average or median the points within one cell, or even apply my own function. 我正在寻找一种方法来将不规则的网格数据重新绑定到规则的网格上,但不进行插值处理(因此,例如,不希望使用matplotlib.mlab.griddata 。优选地,我想对一个单元格内的点取平均或中值,甚至应用我的自己的功能。

The grid is 2D, but since I foresee future cases with different dimensions, an N-dimensional solution is even better. 网格是2D的,但是由于我预见到将来会有不同尺寸的情况,因此N维解决方案更好。

As an example, consider the following data, with x and y coordinates: 例如,请考虑以下具有xy坐标的数据:

data = np.arange(6)
x = np.array([0.4, 0.6, 0.8, 1.5, 1.8, 2.2])
y = np.array([0.4, 0.8, 2.3, 2.5, 2.7, 2.9])

which, when binned to a regular 3x3 grid and using average values, should result in: 将其装箱到常规3x3网格并使用平均值时,其结果为:

[[ 0.5  nan  2. ]
 [ nan  nan  3.5]
 [ nan  nan  5. ]]

(NaN's are optional, but clearer than 0's, since the latter value can be an actual average; this is of course also easy to turn into a masked array.) (NaN是可选的,但比0更清晰,因为后者的值可以是实际平均值;这当然也很容易变成掩码数组。)

So far, I've been able to tackle the problem using Pandas: 到目前为止,我已经能够使用Pandas解决此问题:

xindices = np.digitize(x, np.arange(NX))
yindices = np.digitize(y, np.arange(NY))
df = pd.DataFrame({
    'x': xindices,
    'y': yindices,
    'z': data
})
grouped = df.groupby(['y', 'x'])
result = grouped.aggregate(np.mean).reset_index()
grid = np.empty((NX, NY)) * np.NaN
grid[result['x']-1, result['y']-1] = result['z']

which allows me to pick any aggregating function I like. 这使我可以选择自己喜欢的任何汇总功能。

However, since Pandas is rather general (Pandas doesn't care that x and y are grid indices), I feel that this may not be the optimal solution: having a solution that knows that the input and output are already on a (2D) grid seems more efficient. 但是,由于Pandas相当笼统(Pandas不在乎xy是网格索引),所以我觉得这可能不是最佳解决方案:拥有一个知道输入和输出已经在(2D)上的解决方案网格似乎更有效。 I have, however, not been able to find one; 但是,我找不到一个。 np.digitize comes closest, but that's only 1 dimensional, and still requires a loop in Python to access the indices and average or median over the data. np.digitize最接近,但是只有一维,并且仍然需要Python中的循环才能访问索引以及数据的平均值或中位数。

Does anyone know a better solution that the above one? 有谁知道上面的解决方案更好?

You could use scipy.stats.binned_statistic_2d : 您可以使用scipy.stats.binned_statistic_2d

import numpy as np
import scipy.stats as stats

data = np.arange(6)
x = np.array([0.4, 0.6, 0.8, 1.5, 1.8, 2.2])
y = np.array([0.4, 0.8, 2.3, 2.5, 2.7, 2.9])

NX, NY = 4, 4
statistic, xedges, yedges, binnumber = stats.binned_statistic_2d(
    x, y, values=data, statistic='mean', 
    bins=[np.arange(NX), np.arange(NY)])
print(statistic)

which yields 产生

[[ 0.5  nan  2. ]
 [ nan  nan  3.5]
 [ nan  nan  5. ]]

There is also binned_statistic_dd for higher dimensional binning. 还有binned_statistic_dd用于高维合并。 Each of these functions support user-defined statistics by passing a callable to the statistic parameter. 这些功能中的每一个都通过将可调用传递给statistic参数来支持用户定义的统计statistic

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM