简体   繁体   English

Pandas DataFrame resample() 和 aggregate() 与 MultiIndex 列

[英]Pandas DataFrame resample() and aggregate() with MultiIndex columns

I have a pandas DataFrame with a DatetimeIndex (1 level) and a MultiIndex columns (2 levels).我有一个带有 DatetimeIndex(1 级)和 MultiIndex 列(2 级)的 pandas DataFrame。 I am trying to resample over the DatetimeIndex while applying different aggregation functions over different columns.我正在尝试对 DatetimeIndex 重新采样,同时在不同的列上应用不同的聚合函数。

Here's a code example:这是一个代码示例:

df.groupby([Grouper(freq="5Min"),
            df.columns.unique(level=1)]).agg({"sub_col_0_name": "min",
                                              "sub_col_1_name": "max",
                                              "sub_col_2_name": "mean",
                                              "sub_col_3_name": "std"})

I am getting the following error:我收到以下错误:

ValueError: Grouper and axis must be same length

Can someone explain me how to aggregate over the DatetimeIndex and aggregate different columns with different functions?有人可以解释一下如何聚合 DatetimeIndex 并聚合具有不同功能的不同列吗? Thank you.谢谢你。

With the following toy dataframe:用以下玩具 dataframe:

import numpy as np
import pandas as pd

arrays = [
    ["bar", "bar", "bar", "bar", "foo", "foo", "foo", "foo"],
    ["one", "two", "three", "four", "one", "two", "three", "four"],
]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

df = pd.DataFrame(
    np.random.randint(10, size=(10, 8)),
    index=pd.date_range("2022-09-17", periods=10, freq="1min"),
    columns=index,
)

print(df)
# Output
first               bar                foo
second              one two three four one two three four
2022-09-17 00:00:00   0   1     1    5   4   6     6    0
2022-09-17 00:01:00   7   3     9    0   2   4     8    4
2022-09-17 00:02:00   7   8     0    2   9   5     1    5
2022-09-17 00:03:00   8   4     6    2   5   5     3    0
2022-09-17 00:04:00   4   2     7    3   2   6     3    2
2022-09-17 00:05:00   0   5     9    9   2   3     3    4
2022-09-17 00:06:00   4   8     8    0   6   6     3    9
2022-09-17 00:07:00   4   7     6    7   6   8     7    3
2022-09-17 00:08:00   2   4     9    8   2   5     1    3
2022-09-17 00:09:00   6   7     0    4   6   5     8    6

Here is one way to do it using Pandas resample and MultiIndex.from_product methods:这是使用 Pandas resampleMultiIndex.from_product方法的一种方法:

df = pd.concat(
    [
        df.loc[:, (col,)]
        .resample("5min")
        .agg({"one": "min", "two": "max", "three": "mean", "four": "std"})
        for col in df.columns.get_level_values(0).unique()
    ],
    axis=1,
)

df.columns = pd.MultiIndex.from_product(
    [["bar", "foo"], ["one", "two", "three", "four"]]
)
print(df)
# Output
                    bar                     foo
                    one two three      four one two three      four
2022-09-17 00:00:00   0   8   4.6  1.816590   2   6   4.2  2.280351
2022-09-17 00:05:00   0   8   6.4  3.646917   2   8   4.4  2.549510

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM