given the below dataframe:
import pandas as pd
import numpy as np
np.random.seed(seed=1)
size=20
df = pd.DataFrame({"group":np.random.choice(["A","B","C"], size),
"exclude":np.random.choice(["Yes","No"], size),
"value":np.random.randint(0,5,size=20)}).sort_values(["group","value","exclude"])
For each group I need a column with the group substotal excluding specific rows. I am doing it with the below command:
df["group_sum"] = df[(df.exclude=="No")].groupby("group")["value"].transform("sum")
Unfortunately the column is empty for the excluded records. To populate it I am executing the below:
df["group_sum"] = df.groupby("group")["group_sum"].transform("max")
Is there a way to combine the two statements into one?
You could use where
that keeps non selected lines but set them to NaN values:
df["group_sum"] = df.where(df.exclude== 'No').groupby("group")["value"].transform(
"sum").groupby(df.group).transform("max")
It gives:
group exclude value group_sum
2 A No 0 12.0
6 A No 0 12.0
10 A No 0 12.0
5 A Yes 0 12.0
1 A Yes 1 12.0
8 A No 2 12.0
14 A No 3 12.0
18 A No 3 12.0
19 A No 4 12.0
16 B No 0 4.0
9 B No 1 4.0
0 B Yes 1 4.0
4 B Yes 1 4.0
12 B Yes 1 4.0
7 B No 3 4.0
3 B Yes 4 4.0
17 C No 1 5.0
13 C Yes 1 5.0
11 C Yes 3 5.0
15 C No 4 5.0
You can use Series.map
to map your group
to results from groupby
:
df["group_sum"] = df["group"].map(df[df.exclude=="No"].groupby("group")["value"].sum())
print (df)
group exclude value group_sum
2 A No 0 12
6 A No 0 12
10 A No 0 12
5 A Yes 0 12
1 A Yes 1 12
8 A No 2 12
14 A No 3 12
18 A No 3 12
19 A No 4 12
16 B No 0 4
9 B No 1 4
0 B Yes 1 4
4 B Yes 1 4
12 B Yes 1 4
7 B No 3 4
3 B Yes 4 4
17 C No 1 5
13 C Yes 1 5
11 C Yes 3 5
15 C No 4 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.