[英]How can I speed up xarray resample (much slower than pandas resample)
Here is an MWE for resampling a time series in xarray
vs. pandas
.这是一个 MWE,用于在xarray
与pandas
重新采样时间序列。 The 10Min
resample takes 6.8 seconds in xarray
and 0.003 seconds in pandas
.在10Min
重采样发生在6.8秒xarray
和0.003秒pandas
。 Is there some way to get the Pandas speed in xarray?有没有办法在 xarray 中获得 Pandas 的速度? Pandas resample seems to be independent of the period, while xarray scales with the period. Pandas resample 似乎与周期无关,而 xarray 随周期扩展。
import numpy as np
import xarray as xr
import pandas as pd
import time
def make_ds(freq):
size = 100000
times = pd.date_range('2000-01-01', periods=size, freq=freq)
ds = xr.Dataset({
'foo': xr.DataArray(
data = np.random.random(size),
dims = ['time'],
coords = {'time': times}
)})
return ds
for f in ["1s", "1Min", "10Min"]:
ds = make_ds(f)
start = time.time()
ds_r = ds.resample({'time':"1H"}).mean()
print(f, 'xr', str(time.time() - start))
start = time.time()
ds_r = ds.to_dataframe().resample("1H").mean()
print(f, 'pd', str(time.time() - start))
: 1s xr 0.040313720703125
: 1s pd 0.0033435821533203125
: 1Min xr 0.5757267475128174
: 1Min pd 0.0025794506072998047
: 10Min xr 6.798743486404419
: 10Min pd 0.0029947757720947266
As per the xarray
GH issue this is an implementation issue.根据xarray
GH 问题,这是一个实现问题。 The solution is to do the resampling (actually a GroupBy
) in other code.解决方案是在其他代码中进行重采样(实际上是GroupBy
)。 My solution is to use the fast Pandas resample and then rebuild the xarray dataset:我的解决方案是使用快速 Pandas 重新采样,然后重建 xarray 数据集:
df_h = ds.to_dataframe().resample("1H").mean() # what we want (quickly), but in Pandas form
vals = [xr.DataArray(data=df_h[c], dims=['time'], coords={'time':df_h.index}, attrs=ds[c].attrs) for c in df_h.columns]
ds_h = xr.Dataset(dict(zip(df_h.columns,vals)), attrs=ds.attrs)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.