![](/img/trans.png)
[英]Calculate minimum value for each column of multi-indexed DataFrame in pandas
[英]Calculate time-based rolling average on multi-indexed dataframe
我有一個多索引 dataframe,我將它與掩碼組合在一起。 之后,我想計算基於時間的滾動平均值。
time = pd.date_range('2000-05-01', freq='24H', periods=10)
mult_index = pd.MultiIndex.from_product([time, [0,1]], names=["time", "number"])
data = pd.DataFrame(range(20), index=mult_index)
mask = list(range(5)) * 4
data.groupby(mask).rolling("2d", on=mult_index.levels[0]).mean()
但是,這引發了異常:
Traceback (most recent call last):
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-77-e484a6b352eb>", line 1, in <module>
rolling.count()
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\common.py", line 40, in outer
return self._groupby.apply(f)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\groupby\groupby.py", line 735, in apply
result = self._python_apply_general(f)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\groupby\groupby.py", line 751, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\groupby\ops.py", line 206, in apply
res = f(group)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\common.py", line 38, in f
return getattr(x, name)(*args, **kwargs)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\rolling.py", line 1969, in count
return self._apply(window_func, center=self.center, name="count")
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\rolling.py", line 518, in _apply
return self._wrap_results(results, block_list, obj, exclude)
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\window\rolling.py", line 331, in _wrap_results
final.append(Series(self._on, index=obj.index, name=name))
File "C:\Users\bi4372\.conda\envs\EnergyTimeSeriesFramework\lib\site-packages\pandas\core\series.py", line 292, in __init__
f"Length of passed values is {len(data)}, "
ValueError: Length of passed values is 10, index implies 4
有誰知道如何解決這個問題? 如果我在沒有多索引 dataframe 的情況下嘗試它,一切正常:
time = pd.date_range('2000-05-01', freq='24H', periods=10)
data = pd.DataFrame(range(10), index=time)
mask = list(range(5)) * 2
data.groupby(mask).rolling("2d").mean()
在此先感謝您的幫助。
在下面的答案中,DavideBrex 提出了一種重置索引以解決此問題的方法。 但是,此解決方案的結果是編號為 0 的行會干擾編號為 1 的行。我想避免這種行為。 請參閱以下附加示例:
time = pd.date_range('2000-05-01', freq='24H', periods=3)
mult_index = pd.MultiIndex.from_product([time, [0,1]], names=["time", "number"])
data = pd.DataFrame(range(6), index=mult_index)
data.columns=["col"]
mask = [0,0,1,1,0,1]
res = data.reset_index(level='number').groupby(mask).rolling('3d').mean()
期望的結果是
number col
time
0 2000-05-01 0.0 0.0
2000-05-01 1.0 1.0
2000-05-03 0.0 4.0
1 2000-05-02 0.0 2.0
2000-05-02 1.0 3.0
2000-05-03 1.0 4.0
然而,真實的結果是:
number col
time
0 2000-05-01 0.000000 0.000000
2000-05-01 0.500000 0.500000
2000-05-03 0.000000 4.000000
1 2000-05-02 0.000000 2.000000
2000-05-02 0.500000 2.500000
2000-05-03 0.666667 3.333333
問題是 groupby 給出了 4 行的組:
for i, item in data.groupby(mask):
print(item)
給出:
0
time number
2000-05-01 0 0
2000-05-03 1 5
2000-05-06 0 10
2000-05-08 1 15
0
time number
2000-05-01 1 1
2000-05-04 0 6
2000-05-06 1 11
2000-05-09 0 16
..... .. ...
但是您隨后在滾動 function 中給出 10 個值:
print(mult_index.levels[0])
DatetimeIndex(['2000-05-01', '2000-05-02', '2000-05-03', '2000-05-04',
'2000-05-05', '2000-05-06', '2000-05-07', '2000-05-08',
'2000-05-09', '2000-05-10'],
dtype='datetime64[ns]', name='time', freq='24H')
嘗試這個:
time = pd.date_range('2000-05-01', freq='24H', periods=10)
mult_index = pd.MultiIndex.from_product([time, [0,1]], names=["time", "number"])
data = pd.DataFrame(range(20), index=mult_index)
data.columns=["col"]
mask = list(range(5)) * 4
res = data.reset_index(level='number').groupby(mask).rolling('2d').mean()
res
Output:
time number col
0 2000-05-01 0.0 0.0
0 2000-05-03 1.0 5.0
0 2000-05-06 0.0 10.0
0 2000-05-08 1.0 15.0
1 2000-05-01 1.0 1.0
1 2000-05-04 0.0 6.0
1 2000-05-06 1.0 11.0
1 2000-05-09 0.0 16.0
2 2000-05-02 0.0 2.0
2 2000-05-04 1.0 7.0
2 2000-05-07 0.0 12.0
2 2000-05-09 1.0 17.0
3 2000-05-02 1.0 3.0
3 2000-05-05 0.0 8.0
3 2000-05-07 1.0 13.0
3 2000-05-10 0.0 18.0
4 2000-05-03 0.0 4.0
4 2000-05-05 1.0 9.0
4 2000-05-08 0.0 14.0
4 2000-05-10 1.0 19.0
從答案
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.