[英]Pandas: Cumulative sum from 2 columns with conditions
假设我有两个农场,A 和 B。每周那里都有不同的动物。 我怎样才能获得每个农场当前的动物的累积数量?
+---+-----+--------+-----+--------+
| | A | Farm_A | B | Farm_B |
+---+-----+--------+-----+--------+
| 0 | dog | 1 | cat | 1 |
| 1 | cat | 0 | dog | 1 |
| 2 | cat | 0 | dog | 1 |
| 3 | cat | 1 | dog | 0 |
| 4 | dog | 1 | dog | 1 |
| 5 | dog | 0 | dog | 0 |
| 6 | dog | 1 | cat | 1 |
+---+-----+--------+-----+--------+
使用 groupby 我可以从每个农场获得 cumsum:
df['A cumsum Farm_A'] = df.groupby(['A'])['Farm_A'].cumsum()
df['B cumsum Farm_B'] = df.groupby(['B'])['Farm_B'].cumsum()
+---+-----+--------+-----+--------+-----------------+-----------------+
| | A | Farm_A | B | Farm_B | A cumsum Farm_A | B cumsum Farm_B |
+---+-----+--------+-----+--------+-----------------+-----------------+
| 0 | dog | 1 | cat | 1 | 1 | 1 |
| 1 | cat | 0 | dog | 1 | 0 | 1 |
| 2 | cat | 0 | dog | 1 | 0 | 2 |
| 3 | cat | 1 | dog | 0 | 1 | 2 |
| 4 | dog | 1 | dog | 1 | 2 | 3 |
| 5 | dog | 0 | dog | 0 | 2 | 3 |
| 6 | dog | 1 | cat | 1 | 3 | 2 |
+---+-----+--------+-----+--------+-----------------+-----------------+
我的问题是,我怎样才能从农场 A 和 B 中获得每行动物的累积总和?
例如第 3 行:农场 A 的动物是猫,那么我想要第 0、1、2、3 行中来自农场 A 和 B 的猫的总和 = 2 只猫。
再次在第 3 行,农场 B 的动物是狗,然后我想要第 0、1、2、3 行中两个农场的狗总数 = 3
这就是我想要实现的目标:
+---+-----+--------+-----+--------+-----------------+-----------------+-----------------+-----------------+
| | A | Farm_A | B | Farm_B | A cumsum Farm_A | B cumsum Farm_B | A at both farms | B at both farms |
+---+-----+--------+-----+--------+-----------------+-----------------+-----------------+-----------------+
| 0 | dog | 1 | cat | 1 | 1 | 1 | 1 | 1 |
| 1 | cat | 0 | dog | 1 | 0 | 1 | 1 | 2 |
| 2 | cat | 0 | dog | 1 | 0 | 2 | 1 | 3 |
| 3 | cat | 1 | dog | 0 | 1 | 2 | 2 | 3 |
| 4 | dog | 1 | dog | 1 | 2 | 3 | 4 | 5 |
| 5 | dog | 0 | dog | 0 | 2 | 3 | 5 | 5 |
| 6 | dog | 1 | cat | 1 | 3 | 2 | 6 | 3 |
+---+-----+--------+-----+--------+-----------------+-----------------+-----------------+-----------------+
可以使用假人创建最后两列。 这使您可以跨农场为每种动物类型创建一个cumsum
,然后您可以lookup
它以获得每行的适当值。
import pandas as pd
res = pd.get_dummies(df, columns=['A', 'B'])
# Animals only count if dummy & exists, so need to multiply.
res = pd.concat([res.filter(like='A_').multiply(res.Farm_A, axis=0),
res.filter(like='B_').multiply(res.Farm_B, axis=0)],
axis=1)
# Cumsum per animal
res = res.groupby(res.columns.str.split('_').str[1], axis=1).apply(lambda x: x.sum(1).cumsum())
# cat dog
#0 1 1
#1 1 2
#2 1 3
#3 2 3
#4 2 5
#5 2 5
#6 3 6
# Lookup
df['A at both'] = res.lookup(df.index, df.A)
df['B at both'] = res.lookup(df.index, df.B)
A Farm_A B Farm_B A at both B at both
0 dog 1 cat 1 1 1
1 cat 0 dog 1 1 2
2 cat 0 dog 1 1 3
3 cat 1 dog 0 2 3
4 dog 1 dog 1 5 5
5 dog 0 dog 0 5 5
6 dog 1 cat 1 6 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.