我怎样才能加速 xarray 重采样（比熊猫重采样慢得多）

Question

Here is an MWE for resampling a time series in xarray vs. pandas .这是一个 MWE，用于在xarray与pandas重新采样时间序列。 The 10Min resample takes 6.8 seconds in xarray and 0.003 seconds in pandas .在10Min重采样发生在6.8秒xarray和0.003秒pandas 。 Is there some way to get the Pandas speed in xarray?有没有办法在 xarray 中获得 Pandas 的速度？ Pandas resample seems to be independent of the period, while xarray scales with the period. Pandas resample 似乎与周期无关，而 xarray 随周期扩展。

import numpy as np
import xarray as xr
import pandas as pd
import time

def make_ds(freq):
    size = 100000
    times = pd.date_range('2000-01-01', periods=size, freq=freq)
    ds = xr.Dataset({
        'foo': xr.DataArray(
            data   = np.random.random(size),
            dims   = ['time'],
            coords = {'time': times}
        )})
    return ds

for f in ["1s", "1Min", "10Min"]:
    ds = make_ds(f)

    start = time.time()
    ds_r = ds.resample({'time':"1H"}).mean()
    print(f, 'xr', str(time.time() - start))

    start = time.time()
    ds_r = ds.to_dataframe().resample("1H").mean()
    print(f, 'pd', str(time.time() - start))

: 1s xr 0.040313720703125
: 1s pd 0.0033435821533203125
: 1Min xr 0.5757267475128174
: 1Min pd 0.0025794506072998047
: 10Min xr 6.798743486404419
: 10Min pd 0.0029947757720947266

Answer 1

As per the xarray GH issue this is an implementation issue.根据xarray GH 问题，这是一个实现问题。 The solution is to do the resampling (actually a GroupBy ) in other code.解决方案是在其他代码中进行重采样（实际上是GroupBy ）。 My solution is to use the fast Pandas resample and then rebuild the xarray dataset:我的解决方案是使用快速 Pandas 重新采样，然后重建 xarray 数据集：

df_h = ds.to_dataframe().resample("1H").mean()  # what we want (quickly), but in Pandas form
vals = [xr.DataArray(data=df_h[c], dims=['time'], coords={'time':df_h.index}, attrs=ds[c].attrs) for c in df_h.columns]
ds_h = xr.Dataset(dict(zip(df_h.columns,vals)), attrs=ds.attrs)

我怎样才能加速 xarray 重采样（比熊猫重采样慢得多）

问题描述

1 个解决方案

解决方案1
1 2020-10-11 11:22:54

我怎样才能加速 xarray 重采样（比熊猫重采样慢得多）

问题描述

1 个解决方案

解决方案1 1 2020-10-11 11:22:54

解决方案1
1 2020-10-11 11:22:54