在 python 中使用 pandas 對數據框對象執行計算並將它們附加到按對象分組的多索引級別

Question

我正在嘗試將一行添加到多索引級別，並執行基於未分組數據框中的各個行構建的計算。 然后將計算添加到分組數據框中。

import numpy as np
import pandas as pd
import random

years = [2000, 2001, 2002]
products = ["A", "B", "C"]

num_combos = 10

years = [random.choice(years) for i in range(num_combos)]
products = [random.choice(products) for i in range(num_combos)]

sum_values = list(range(0, num_combos))
random.shuffle(sum_values)
av_values = [random.randrange(0, num_combos, 1) for i in range(num_combos)]

cols = {"years": years,
        "products": products,
        "sum_col": sum_values,
        "av_col": av_values}

df = pd.DataFrame(cols)

上述數據幀是隨機生成的。 我有一個 df，其中包含許多列，我想根據個人賬戶求和，或者根據個人賬戶求平均值。 我可以使用以下方法實現這一點：

gdf = df.groupby(["products", "years"]).agg(s = ("sum_col", "sum"),
                                            a = ("av_col", "mean"))

但是，我現在想在這個多索引級別添加一行，表示為“Total/Avg”，其中某些列“Total/Avg”行由各個行的總和確定，（在總和的情況下，我可以該級別的總和）或確定其他列的各個行的平均值。 下面提供了一種解決方案：

def addTotalAvgMultiindex(df):
    num_indexes = len(list(df.index.levels))
    if num_indexes == 3:
        a, b, c = df.index.levels
        df = df.reindex(pd.MultiIndex.from_product([a, b, [*c, 'Total/Avg']]))
    elif num_indexes == 4:
        a, b, c, d = df.index.levels
        df = df.reindex(pd.MultiIndex.from_product([a, b, c, [*d, 'Total/Avg']]))
    elif num_indexes == 2:
        a, b = df.index.levels
        df = df.reindex(pd.MultiIndex.from_product([a, [*b, 'Total/Avg']]))
    return df

gdf = addTotalAvgMultiindex(gdf)
gdf.index = gdf.index.set_names(["products", "years"])

for col in gdf.columns:
    if col == "s":
        total = df.groupby(["products"]).agg(total=("sum_col", "sum"))
    elif col == "a":
        total = df.groupby(["products"]).agg(total=("av_col", "mean"))
    
    total_values = [x for xs in total.values for x in xs]
        
    gdf[col][gdf.index.get_level_values("years") == "Total/Avg"] = total_values

這似乎很乏味，特別是如果我有很多列（目前只需要求和或平均值，但可以添加其他度量，例如中位數）。

是否有更智能的方法將行添加到多索引並根據 df 數據框中的各個行計算結果？ （不需要重新索引和重命名級別，需要遍歷列然后一次填寫一個值？）假設有幾列需要求和，還有幾列需要平均。

Answer 1

您可以在同一個多索引上進行聚合，其中某些列設置為常量值，然后使用此聚合結果與之前的聚合結果合並。

total_gdf = df.assign(years="Total/avg").groupby(
    ["products", "years"]).agg(s=("sum_col", "sum"), a=("av_col", "mean"))
pd.concat([gdf,total_gdf]).sort_index()

在 python 中使用 pandas 對數據框對象執行計算並將它們附加到按對象分組的多索引級別

問題描述

1 個解決方案

解決方案1
1 已采納 2022-06-20 13:27:33

在 python 中使用 pandas 對數據框對象執行計算並將它們附加到按對象分組的多索引級別

問題描述

1 個解決方案

解決方案1 1 已采納 2022-06-20 13:27:33

解決方案1
1 已采納 2022-06-20 13:27:33