在 Pandas 中索引 boolean 的組級別的累積計數

Question

我正在開發這個問題的答案，它只使用 boolean 索引而不是cumcount 。 預期的 output 是一個total_paid_invoices列，對於每家公司，它計算每條記錄之前（在日期時間方面）支付的到達聲音的數量。

    company invoice date
0   A   1234    20120201
1   A   1134    20120201
2   A   1011    20120201
3   A   1123    20121004
4   A   1111    20121004
5   A   1224    20121105
6   B   1156    20120403
7   B   2345    20120504
8   B   4567    20120504
9   B   8796    20120606

我在 for 循環中而不是在 groupby 中處理公司組：

for company in df.company.unique():
    df['total_paid_invoices'] = df.date.apply(
        lambda x: df.loc[(df.date<x)&(df.company==company)].shape[0]
    )

但是，output 在第五行不正確（值應該是5 ）：

    company invoice date    total_paid_invoices
0   A   1234    2012-02-01  0
1   A   1134    2012-02-01  0
2   A   1011    2012-02-01  0
3   A   1123    2012-10-04  4
4   A   1111    2012-10-04  4
5   A   1224    2012-11-05  4

這就是我問這個問題的原因：當我在第五行的日期做手術時

df.loc[(df.date<df.date.iloc[5])&(df.company=='A')].shape[0]

output 是5 。 為什么這沒有進入 output dataframe 而我們在示例數據中看到的值的 rest 得到正確處理？

Answer 1

感謝@rafaelc 指出覆蓋問題。 您需要為=兩側的company編制索引，以便一次將 lambda function 應用於 dataframe 的子集：

for company in df.company.unique():
    df.loc[df.company==company, 'total_paid_invoices'] = df.date.apply(
        lambda x: df.loc[(df.date<x)&(df.company==company)].shape[0]
    )

在 Pandas 中索引 boolean 的組級別的累積計數

問題描述

1 個解決方案

解決方案1
1 已采納 2021-04-03 16:28:45

在 Pandas 中索引 boolean 的組級別的累積計數

問題描述

1 個解決方案

解決方案1 1 已采納 2021-04-03 16:28:45

解決方案1
1 已采納 2021-04-03 16:28:45