[英]Cumulative sum based on another column's boolean value
I have a pandas dataframe with the following format我有一个具有以下格式的熊猫数据框
name | is_valid | account | transaction
Adam | True | debit | +10
Adam | False | credit | +10
Adam | True | credit | +10
Benj | True | credit | +10
Benj | False | debit | +10
Adam | True | credit | +10
I want to create two new columns credit_cumulative
and debit_cumulative
.我想创建两个新列
credit_cumulative
和debit_cumulative
。 For credit_cumulative
, it counts the cumulative sum of the transaction column for the corresponding person , and for the corresponding account in that row, the transaction column will count only if is_valid column is true.对于
credit_cumulative
,它计算对应 person的 transaction 列的累积总和,对于该行中的对应 account ,只有is_valid 列为 true 时,才会计算 transaction 列。 debit_cumulative wants to behave in the same way. debit_cumulative 希望以同样的方式行事。
In the above example, the result should be:在上面的例子中,结果应该是:
from | is_valid | account | transaction | credit_cumulative | debit_cumulative
Adam | True | debit | +10 | 0 | 10
Adam | False | credit | +10 | 0 | 10
Adam | True | credit | +10 | 10 | 10
Benj | True | credit | +10 | 10 | 0
Benj | False | debit | +10 | 10 | 0
Adam | True | credit | +10 | 20 | 10
To illustrate, the first row is Adam, and account is debit, is_valid is true, so we increase debit_cumulative by 10.为了说明,第一行是 Adam,account 是 debit,is_valid 为 true,所以我们将 debit_cumulative 增加 10。
For the second row, is_valid is negative.对于第二行,is_valid 为负数。 So transaction does not count.
所以交易不算。 Name is Adam, is credit_cumulative and debit_cumulative will remain the same.
名称是 Adam,credit_cumulative 和 debit_cumulative 将保持不变。
All rows shall behave this way.所有行都应以这种方式运行。
Here is the code to the original data I described:这是我描述的原始数据的代码:
d = {'name': ['Adam', 'Adam', 'Adam', 'Benj', 'Benj', 'Adam'], 'is_valid': [True, False, True, True, False, True], 'account': ['debit', 'credit', 'credit', 'credit', 'debit', 'credit'], 'transaction': [10, 10, 10, 10, 10, 10]}
df = pd.DataFrame(data=d)
Try:尝试:
# credit
mask = df.is_valid.eq(True) & df.account.eq("credit")
df.loc[mask, "credit_cumulative"] = (
df[mask].groupby(["name", "account"])["transaction"].cumsum()
)
df["credit_cumulative"] = df.groupby("name")["credit_cumulative"].apply(
lambda x: x.ffill().fillna(0)
)
# debit
mask = df.is_valid.eq(True) & df.account.eq("debit")
df.loc[mask, "debit_cumulative"] = (
df[mask].groupby(["name", "account"])["transaction"].cumsum()
)
df["debit_cumulative"] = df.groupby("name")["debit_cumulative"].apply(
lambda x: x.ffill().fillna(0)
)
print(df)
Prints:印刷:
name is_valid account transaction credit_cumulative debit_cumulative
0 Adam True debit 10 0.0 10.0
1 Adam False credit 10 0.0 10.0
2 Adam True credit 10 10.0 10.0
3 Benj True credit 10 10.0 0.0
4 Benj False debit 10 10.0 0.0
5 Adam True credit 10 20.0 10.0
Here are a few ways to do what your question asks:这里有几种方法可以解决您的问题:
Method 1:方法一:
dfc = pd.concat([
df[['name','is_valid']],
df.transaction[df.account=='credit'].reindex(df.index, fill_value=0).rename('credit_cumulative'),
df.transaction[df.account=='debit'].reindex(df.index, fill_value=0).rename('debit_cumulative')
], axis=1)
dfc.loc[~dfc.is_valid, ['credit_cumulative', 'debit_cumulative']] = 0
df = pd.concat([df, dfc.drop(columns='is_valid').groupby('name').cumsum()], axis=1)
Output:输出:
name is_valid account transaction credit_cumulative debit_cumulative
0 Adam True debit 10 0 10
1 Adam False credit 10 0 10
2 Adam True credit 10 10 10
3 Benj True credit 10 10 0
4 Benj False debit 10 10 0
5 Adam True credit 10 20 10
Explanation:解释:
transaction
into two new columns for credit and debit and adds these to the name
and is_valid
columns of the original dataframetransaction
分成两个新的贷方和借方列,并将它们添加到原始数据框的name
和is_valid
列is_valid
is Falseis_valid
为 False 的新列清零groupby().cumsum()
to aggregate these columns by name
groupby().cumsum()
按name
聚合这些列concat()
to add the cumsum()
columns to the original dataframe.concat()
将cumsum()
列添加到原始数据帧。 Method 2:方法二:
If we want, we can go further and replace the dfc
assignment by factoring out the similar processing of credit
and debit
into a list comprehension as in the following:如果我们愿意,我们可以更进一步,通过将类似的
credit
和debit
处理分解到列表理解中来替换dfc
分配,如下所示:
dfc = pd.concat([
df[['name','is_valid']],
*[df.transaction[df.account==transType].reindex(df.index, fill_value=0).rename(
transType + '_cumulative') for transType in ('credit', 'debit')]
], axis=1)
Method 3:方法三:
Another dfc
assignment alternative using unstack()
is this:另一个使用 unstack
unstack()
的dfc
分配替代方法是:
dfc = pd.concat([
df[['name','is_valid']],
df.set_index('account', append=True)['transaction'].unstack(level=-1, fill_value=0).rename(
columns={x:x + '_cumulative' for x in df.account.unique()})
], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.