Pandas pivot_table() aggfunc 聚合以多列為條件？

Question

我想用 Pandas 數據透視表聚合一列，但自定義聚合應該以數據幀中的不同列為條件。

請參見下面的示例：假設如果“Number_mentions”的值高於閾值，我想為“Newspaper”列中的每個值對“Number_mentions”列求和。 使用自定義 aggfunc 很容易做到這一點。 但是，此外，如果我只想對與“國家/地區”列中的值“RU”不在同一行的那些“Number_mentions”求和，該怎么辦？ 似乎 aggfunc 只能將一列與其他列隔離開來，我不知道如何將整個數據幀放入 aggfunc 中以在 aggfunc 中進行條件子集化。

df = pd.DataFrame({"Number_mentions": [1,5,2,3,6,5], 
                   "Newspaper": ["Newspaper1", "Newspaper1", "Newspaper2", "Newspaper3", "Newspaper4", "Newspaper5"], 
                   "Country": ["US", "US", "CN", "CN", "RU", "RU"]})

def articles_above_thresh_with_condition(input_series, thresh=2):
    series_bool = input_series > thresh
    # ! add some if condition based on additional column in df: 
    # ! only aggregate those values where column "Country" is not "RU". 
    # ? code ? 
    n_articles_above_thresh = sum(series_bool)
    return n_articles_above_thresh

df_piv = pd.pivot_table(df, values=["Number_mentions"],
                        index="Newspaper", columns=None, margins=False,
                        aggfunc=articles_above_thresh_with_condition)

Answer 1

您需要不同的方法，因為 pivot_table 不能處理 2 列。

因此，首先通過Series.where將不匹配的值替換為缺失值，然后處理這個新列：

df["Number_mentions1"] = df["Number_mentions"].where(df["Country"].ne('RU'))
print (df)
   Number_mentions   Newspaper Country  Number_mentions1
0                1  Newspaper1      US               1.0
1                5  Newspaper1      US               5.0
2                2  Newspaper2      CN               2.0
3                3  Newspaper3      CN               3.0
4                6  Newspaper4      RU               NaN
5                5  Newspaper5      RU               NaN

df_piv = pd.pivot_table(df, values=["Number_mentions1"],
                        index="Newspaper", columns=None, margins=False,
                        aggfunc=articles_above_thresh_with_condition)
print (df_piv)
            Number_mentions1
Newspaper                   
Newspaper1               1.0
Newspaper2               0.0
Newspaper3               1.0
Newspaper4               0.0
Newspaper5               0.0

Pandas pivot_table() aggfunc 聚合以多列為條件？

問題描述

1 個解決方案

解決方案1
0 2020-11-20 11:05:19

Pandas pivot_table() aggfunc 聚合以多列為條件？

問題描述

1 個解決方案

解決方案1 0 2020-11-20 11:05:19

解決方案1
0 2020-11-20 11:05:19