使用XArray滚动分位数

Question

Is there a xArray way of computing quantiles on a DataArray.rolling window? 在DataArray.rolling窗口上有xArray计算分位数的方法吗？ The listed available methods include mean or median , but nothing on quantiles/percentiles. 列出的可用方法包括mean或median ，但不包括分位数/百分位数。 I was wondering if this could be somehow done even though there is no direct way. 我想知道即使没有直接方法也可以以某种方式完成。

Currently, I am locally migrating the xArray data to a pandas.DataFrame , where I apply the rolling().quantile() sequence. 当前，我正在将xArray数据本地迁移到pandas.DataFrame ，在其中应用了rolling().quantile()序列。 After that, I take the values of the new DataFrame and build a xArray.DataArray from it. 之后，我将获取新DataFrame的值并DataFrame构建一个xArray.DataArray 。 The reproducible code: 可复制的代码：

import xarray as xr
import pandas as pd
import numpy as np

times = np.arange(0, 30)
locs = ['A', 'B', 'C', 'D'] 

signal = xr.DataArray(np.random.rand(len(times), len(locs)), 
                      coords=[times, locs], dims=['time', 'locations'])
window = 5

df = pd.DataFrame(data=signal.data)
roll = df.rolling(window=window, center=True, axis=0).quantile(.25).dropna()
window_array = xr.DataArray(roll.values, 
            coords=[np.arange(0, signal.time.shape[0] - window + 1), signal.locations], 
            dims=['time', 'locations'])

Any clue to stick to xArray as much as possible is welcome. 欢迎xArray尽可能多地坚持使用xArray任何线索。

Let us consider the same problem, only smaller in size (10 time instances, 2 locations). 让我们考虑同样的问题，只是规模较小（10个时间实例，2个位置）。

Here is the input of the first method (via pandas ): 这是第一种方法的输入（通过pandas ）：

<xarray.DataArray (time: 8, locations: 2)>
array([[0.404362, 0.076203],
       [0.353639, 0.076203],
       [0.387167, 0.102917],
       [0.525404, 0.298231],
       [0.755646, 0.298231],
       [0.460749, 0.414935],
       [0.104887, 0.498813],
       [0.104887, 0.420935]])
Coordinates:
* time       (time) int32 0 1 2 3 4 5 6 7
* locations  (locations) <U1 'A' 'B'

Note that the 'time' dimension is smaller, due to calling dropna() on the rolling object. 注意，由于在滚动对象上调用dropna() ，因此“时间”维较小。 The new dimension size is basically len(times) - window + 1 . 新的尺寸大小基本上是len(times) - window + 1 。 Now, the output for the proposed method (via construct ): 现在，提出的方法的输出（通过construct ）：

<xarray.DataArray (time: 10, locations: 2)>
array([[0.438426, 0.127881],
       [0.404362, 0.076203],
       [0.353639, 0.076203],
       [0.387167, 0.102917],
       [0.525404, 0.298231],
       [0.755646, 0.298231],
       [0.460749, 0.414935],
       [0.104887, 0.498813],
       [0.104887, 0.420935],
       [0.112651, 0.60338 ]])
Coordinates:
* time       (time) int32 0 1 2 3 4 5 6 7 8 9
* locations  (locations) <U1 'A' 'B'

It seems like the dimensions are still (time, locations) , with the size of the former equal to 10, not 8. In the example here, since center=True , the two results are the same if you remove the first and the last rows in the second array. 似乎尺寸仍然是(time, locations) ，前者的大小等于10，而不是8。在这里的示例中，由于center=True ，如果删除第一个和最后一个，则两个结果相同第二个数组中的行。 Shouldn't the DataArray have a new dimension, the tmp ? DataArray不应该有一个新的维度tmp吗？

Also, this method (with bottleneck installed) takes more than the one initially proposed via pandas . 而且，这种方法（安装了bottleneck ）比通过pandas最初提出的方法要花费更多。 For example, on a case study of 1000 times x 2 locations , the pandas run takes 0.015 s, while the construct one takes 1.25 s. 例如，在1000 times x 2个locations的案例研究中， pandas运行需要0.015 s，而construct一个需要1.25 s。

Answer 1

You can use construct method of the rolling object, which generates a new DataArray with the rolling dimension. 您可以使用滚动对象的construct方法，该方法将生成一个具有滚动尺寸的新DataArray 。

signal.rolling(time=window, center=True).construct('tmp').quantile(.25, dim='tmp')

Above, I constructed a DataArray with additional tmp dimension and compute quantile along this dimension. 上面，我构造了一个具有附加tmp维度的DataArray并沿该维度计算分位数。

使用XArray滚动分位数

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-02-10 09:16:38

使用XArray滚动分位数

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-02-10 09:16:38

解决方案1
4 已采纳 2019-02-10 09:16:38