简体   繁体   English

使用scipy.stats来适合xarray DataArray

[英]Using scipy.stats to fit xarray DataArray

I want to compute the parameters of a statistical distribution fitted over the time dimension of an xarray.DataArray. 我想计算在xarray.DataArray的时间维度上拟合的统计分布的参数。

I'd like to create a function that does something like: 我想创建一个执行以下操作的函数:

from scipy import stats
import xarray as xr

def fit(arr):
    return xr.apply_ufunc(stats.norm.fit, arr, ...)

that returns a new DataArray storing the two parameters of the distribution computed over the time dimension. 返回一个新的DataArray,该数据存储存储在时间范围内计算的分布的两个参数。 So if an input has dimensions (time, lat, lon), fit would return a DataArray with dimensions (params, lat, lon). 因此,如果输入具有维度(时间,纬度,经度),fit将返回具有维度(参数,纬度,经度)的DataArray。 The next step would be to use these parameters to compute various percentiles (eg stats.norm.ppf). 下一步将是使用这些参数来计算各种百分位数(例如stats.norm.ppf)。

After many unsuccessful trials, I'm doubting apply_ufunc supports this use case and that I should rather do the computation using 经过多次失败的尝试之后,我怀疑apply_ufunc支持此用例,而我宁愿使用

params = np.apply_along_axis(stats.norm.fit, arr.get_axis_num('time'), arr.data)

then create the DataArray manually, copying dimensions and attributes. 然后手动创建DataArray,复制维度和属性。

Thoughts? 思考? Suggestions? 建议?


Here is what I ended up doing, which feels a bit like a hack: 这是我最终要做的,感觉有点像hack:

# Fit the parameters (lazy computation)
data = dask.array.apply_along_axis(dc.fit, arr.get_axis_num('time'), arr)

# Create a DataArray with the desired dimensions to copy them over to the parameter array.
mean = arr.mean(dim='time', keep_attrs=True)
coords = dict(mean.coords.items())
coords['dparams'] = ([] if dc.shapes is None else dc.shapes.split(',')) + ['loc', 'scale']
out = xr.DataArray(data=data, coords=coords, dims=(u'dparams',) + mean.dims)
out.attrs = arr.attrs

Dask array includes an analogue of apply_along_axis , may be the most obvious place to start. DASK阵列包括的类似物apply_along_axis ,可能是最明显的地方开始。 Note that each variable of an xarray that has chunks set automatically encapsulate a dask array in the .data attribute. 请注意,具有数组设置的xarray的每个变量都会自动将dask数组封装在.data属性中。 You may even be able to pass the xarray variable directly. 您甚至可以直接传递xarray变量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM