简体   繁体   中英

Pandas DataFrame resample() and aggregate() with MultiIndex columns

I have a pandas DataFrame with a DatetimeIndex (1 level) and a MultiIndex columns (2 levels). I am trying to resample over the DatetimeIndex while applying different aggregation functions over different columns.

Here's a code example:

df.groupby([Grouper(freq="5Min"),
            df.columns.unique(level=1)]).agg({"sub_col_0_name": "min",
                                              "sub_col_1_name": "max",
                                              "sub_col_2_name": "mean",
                                              "sub_col_3_name": "std"})

I am getting the following error:

ValueError: Grouper and axis must be same length

Can someone explain me how to aggregate over the DatetimeIndex and aggregate different columns with different functions? Thank you.

With the following toy dataframe:

import numpy as np
import pandas as pd

arrays = [
    ["bar", "bar", "bar", "bar", "foo", "foo", "foo", "foo"],
    ["one", "two", "three", "four", "one", "two", "three", "four"],
]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])

df = pd.DataFrame(
    np.random.randint(10, size=(10, 8)),
    index=pd.date_range("2022-09-17", periods=10, freq="1min"),
    columns=index,
)

print(df)
# Output
first               bar                foo
second              one two three four one two three four
2022-09-17 00:00:00   0   1     1    5   4   6     6    0
2022-09-17 00:01:00   7   3     9    0   2   4     8    4
2022-09-17 00:02:00   7   8     0    2   9   5     1    5
2022-09-17 00:03:00   8   4     6    2   5   5     3    0
2022-09-17 00:04:00   4   2     7    3   2   6     3    2
2022-09-17 00:05:00   0   5     9    9   2   3     3    4
2022-09-17 00:06:00   4   8     8    0   6   6     3    9
2022-09-17 00:07:00   4   7     6    7   6   8     7    3
2022-09-17 00:08:00   2   4     9    8   2   5     1    3
2022-09-17 00:09:00   6   7     0    4   6   5     8    6

Here is one way to do it using Pandas resample and MultiIndex.from_product methods:

df = pd.concat(
    [
        df.loc[:, (col,)]
        .resample("5min")
        .agg({"one": "min", "two": "max", "three": "mean", "four": "std"})
        for col in df.columns.get_level_values(0).unique()
    ],
    axis=1,
)

df.columns = pd.MultiIndex.from_product(
    [["bar", "foo"], ["one", "two", "three", "four"]]
)
print(df)
# Output
                    bar                     foo
                    one two three      four one two three      four
2022-09-17 00:00:00   0   8   4.6  1.816590   2   6   4.2  2.280351
2022-09-17 00:05:00   0   8   6.4  3.646917   2   8   4.4  2.549510

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM