用pandas.MultiIndex重采樣：Resampler.aggregate（）和Resampler [column]

Question

我正在嘗試對數據框重新采樣。 首先，我想在結果中保留幾個聚合。 第二，對於特定列，存在其他感興趣的聚合。 由於此聚合僅與單個列相關，因此可以將重采樣器限制為該列，以免將聚合不必要地應用於其他列。

此方案適用於簡單的一維列索引：

import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.rand(50,4), index=pd.to_datetime(np.arange(0, 50), unit="s"), columns=["a", "b", "c", "d"])
r = df.resample("10s")
result = r.aggregate(["mean", "std"])
result[("d", "ffill")] = r["d"].ffill()
print(result)

但是，一旦我開始使用多索引列，就會出現問題。 首先，我無法一次保留多個聚合：

df.columns = pd.MultiIndex.from_product([("a", "b"), ("alpha", "beta")])
r = df.resample("10s")    # can be omitted
result = r.aggregate(["mean", "std"])
---> AttributeError: 'Series' object has no attribute 'columns'

其次，重采樣器不能再局限於相關列：

r[("b", "beta")].ffill()
--> KeyError: "Columns not found: 'b', 'beta'"

如何將我的關注點從簡單的索引轉換為多索引？

Answer 1

您可以在groupby使用pd.Grouper而不是重新采樣，例如：

result = df.groupby(pd.Grouper(freq='10s',level=0)).aggregate(["mean", "std"])
print (result)
                           a                                       b  \
                        alpha                beta               alpha   
                         mean       std      mean       std      mean   
1970-01-01 00:00:00  0.460569  0.312508  0.476511  0.260534  0.479577   
1970-01-01 00:00:10  0.441498  0.315277  0.487855  0.306068  0.535842   
1970-01-01 00:00:20  0.569884  0.248503  0.320552  0.288479  0.507755   
1970-01-01 00:00:30  0.478037  0.262654  0.552214  0.251581  0.505132   
1970-01-01 00:00:40  0.611227  0.328916  0.473773  0.241604  0.358298   


                                   beta            
                          std      mean       std  
1970-01-01 00:00:00  0.357493  0.448487  0.294432  
1970-01-01 00:00:10  0.259145  0.472250  0.320954  
1970-01-01 00:00:20  0.369490  0.432944  0.150473  
1970-01-01 00:00:30  0.298759  0.381614  0.248785  
1970-01-01 00:00:40  0.203831  0.381412  0.374965

對於第二部分，我不確定您的意思，但是根據在單列級別情況下給出的結果，嘗試執行此操作可以得出結果

result[("b", "beta",'ffill')] = df.groupby(pd.Grouper(freq='10s',level=0))[[("b", "beta")]].first()

Answer 2

一定是aggregate錯誤。 解決方法是stack ：

(df.stack().groupby(level=-1)
  .apply(lambda x:x.resample('10s', level=0).aggregate(["mean", "std"]))
  .unstack(level=0)
)

用pandas.MultiIndex重采樣：Resampler.aggregate（）和Resampler [column]

問題描述

2 個解決方案

解決方案1
3 已采納 2019-09-10 15:41:20

解決方案2
2 2019-09-10 15:37:38

用pandas.MultiIndex重采樣：Resampler.aggregate（）和Resampler [column]

問題描述

2 個解決方案

解決方案1 3 已采納 2019-09-10 15:41:20

解決方案2 2 2019-09-10 15:37:38

解決方案1
3 已采納 2019-09-10 15:41:20

解決方案2
2 2019-09-10 15:37:38