基于另一列的布尔值的累积和

Question

I have a pandas dataframe with the following format我有一个具有以下格式的熊猫数据框

name | is_valid | account | transaction 
Adam |  True    |  debit  |   +10       
Adam |  False   |  credit |   +10       
Adam |  True    |  credit |   +10       
Benj |  True    |  credit |   +10       
Benj |  False   |  debit  |   +10       
Adam |  True    |  credit |   +10

I want to create two new columns credit_cumulative and debit_cumulative .我想创建两个新列credit_cumulative和debit_cumulative 。 For credit_cumulative , it counts the cumulative sum of the transaction column for the corresponding person , and for the corresponding account in that row, the transaction column will count only if is_valid column is true.对于credit_cumulative ，它计算对应 person的 transaction 列的累积总和，对于该行中的对应 account ，只有is_valid 列为 true 时，才会计算 transaction 列。 debit_cumulative wants to behave in the same way. debit_cumulative 希望以同样的方式行事。

In the above example, the result should be:在上面的例子中，结果应该是：

from | is_valid | account | transaction | credit_cumulative | debit_cumulative
Adam |  True    |  debit  |   +10       |       0           |        10
Adam |  False   |  credit |   +10       |       0           |        10
Adam |  True    |  credit |   +10       |       10          |        10
Benj |  True    |  credit |   +10       |       10          |        0
Benj |  False   |  debit  |   +10       |       10          |        0
Adam |  True    |  credit |   +10       |       20          |        10

To illustrate, the first row is Adam, and account is debit, is_valid is true, so we increase debit_cumulative by 10.为了说明，第一行是 Adam，account 是 debit，is_valid 为 true，所以我们将 debit_cumulative 增加 10。

For the second row, is_valid is negative.对于第二行，is_valid 为负数。 So transaction does not count.所以交易不算。 Name is Adam, is credit_cumulative and debit_cumulative will remain the same.名称是 Adam，credit_cumulative 和 debit_cumulative 将保持不变。

All rows shall behave this way.所有行都应以这种方式运行。

Here is the code to the original data I described:这是我描述的原始数据的代码：

d = {'name': ['Adam', 'Adam', 'Adam', 'Benj', 'Benj', 'Adam'], 'is_valid': [True, False, True, True, False, True], 'account': ['debit', 'credit', 'credit', 'credit', 'debit', 'credit'], 'transaction': [10, 10, 10, 10, 10, 10]}
df = pd.DataFrame(data=d)

Answer 1

Try:尝试：

# credit

mask = df.is_valid.eq(True) & df.account.eq("credit")
df.loc[mask, "credit_cumulative"] = (
    df[mask].groupby(["name", "account"])["transaction"].cumsum()
)

df["credit_cumulative"] = df.groupby("name")["credit_cumulative"].apply(
    lambda x: x.ffill().fillna(0)
)

# debit

mask = df.is_valid.eq(True) & df.account.eq("debit")
df.loc[mask, "debit_cumulative"] = (
    df[mask].groupby(["name", "account"])["transaction"].cumsum()
)

df["debit_cumulative"] = df.groupby("name")["debit_cumulative"].apply(
    lambda x: x.ffill().fillna(0)
)

print(df)

Prints:印刷：

   name  is_valid account  transaction  credit_cumulative  debit_cumulative
0  Adam      True   debit           10                0.0              10.0
1  Adam     False  credit           10                0.0              10.0
2  Adam      True  credit           10               10.0              10.0
3  Benj      True  credit           10               10.0               0.0
4  Benj     False   debit           10               10.0               0.0
5  Adam      True  credit           10               20.0              10.0

Answer 2

Here are a few ways to do what your question asks:这里有几种方法可以解决您的问题：

Method 1:方法一：

dfc = pd.concat([
    df[['name','is_valid']], 
    df.transaction[df.account=='credit'].reindex(df.index, fill_value=0).rename('credit_cumulative'),
    df.transaction[df.account=='debit'].reindex(df.index, fill_value=0).rename('debit_cumulative')
], axis=1)
dfc.loc[~dfc.is_valid, ['credit_cumulative', 'debit_cumulative']] = 0
df = pd.concat([df, dfc.drop(columns='is_valid').groupby('name').cumsum()], axis=1)

Output:输出：

   name  is_valid account  transaction  credit_cumulative  debit_cumulative
0  Adam      True   debit           10                  0                10
1  Adam     False  credit           10                  0                10
2  Adam      True  credit           10                 10                10
3  Benj      True  credit           10                 10                 0
4  Benj     False   debit           10                 10                 0
5  Adam      True  credit           10                 20                10

Explanation:解释：

Create a new dataframe that partitions transaction into two new columns for credit and debit and adds these to the name and is_valid columns of the original dataframe创建一个新的数据框，将transaction分成两个新的贷方和借方列，并将它们添加到原始数据框的name和is_valid列
Zero out these new columns where is_valid is False将这些is_valid为 False 的新列清零
Use groupby().cumsum() to aggregate these columns by name使用groupby().cumsum()按name聚合这些列
Use concat() to add the cumsum() columns to the original dataframe.使用concat()将cumsum()列添加到原始数据帧。

Method 2:方法二：

If we want, we can go further and replace the dfc assignment by factoring out the similar processing of credit and debit into a list comprehension as in the following:如果我们愿意，我们可以更进一步，通过将类似的credit和debit处理分解到列表理解中来替换dfc分配，如下所示：

dfc = pd.concat([
    df[['name','is_valid']], 
    *[df.transaction[df.account==transType].reindex(df.index, fill_value=0).rename(
    transType + '_cumulative') for transType in ('credit', 'debit')]
], axis=1)

Method 3:方法三：

Another dfc assignment alternative using unstack() is this:另一个使用 unstack unstack()的dfc分配替代方法是：

dfc = pd.concat([
    df[['name','is_valid']],
    df.set_index('account', append=True)['transaction'].unstack(level=-1, fill_value=0).rename(
    columns={x:x + '_cumulative' for x in df.account.unique()})
], axis=1)

基于另一列的布尔值的累积和

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-06-25 20:09:13

解决方案2
1 2022-06-25 21:08:47

基于另一列的布尔值的累积和

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-06-25 20:09:13

解决方案2 1 2022-06-25 21:08:47

解决方案1
3 已采纳 2022-06-25 20:09:13

解决方案2
1 2022-06-25 21:08:47