简体   繁体   中英

Pandas subtotal with filter to the entire group

given the below dataframe:

import pandas as pd
import numpy as np
np.random.seed(seed=1)
size=20
df = pd.DataFrame({"group":np.random.choice(["A","B","C"], size),
                   "exclude":np.random.choice(["Yes","No"], size),
                   "value":np.random.randint(0,5,size=20)}).sort_values(["group","value","exclude"])

For each group I need a column with the group substotal excluding specific rows. I am doing it with the below command:

df["group_sum"] = df[(df.exclude=="No")].groupby("group")["value"].transform("sum")

Unfortunately the column is empty for the excluded records. To populate it I am executing the below:

df["group_sum"] = df.groupby("group")["group_sum"].transform("max")

Is there a way to combine the two statements into one?

You could use where that keeps non selected lines but set them to NaN values:

df["group_sum"] = df.where(df.exclude== 'No').groupby("group")["value"].transform(
                    "sum").groupby(df.group).transform("max")

It gives:

   group exclude  value  group_sum
2      A      No      0       12.0
6      A      No      0       12.0
10     A      No      0       12.0
5      A     Yes      0       12.0
1      A     Yes      1       12.0
8      A      No      2       12.0
14     A      No      3       12.0
18     A      No      3       12.0
19     A      No      4       12.0
16     B      No      0        4.0
9      B      No      1        4.0
0      B     Yes      1        4.0
4      B     Yes      1        4.0
12     B     Yes      1        4.0
7      B      No      3        4.0
3      B     Yes      4        4.0
17     C      No      1        5.0
13     C     Yes      1        5.0
11     C     Yes      3        5.0
15     C      No      4        5.0

You can use Series.map to map your group to results from groupby :

 df["group_sum"] = df["group"].map(df[df.exclude=="No"].groupby("group")["value"].sum())

 print (df)

    group exclude  value  group_sum
 2      A      No      0         12
 6      A      No      0         12
 10     A      No      0         12
 5      A     Yes      0         12
 1      A     Yes      1         12
 8      A      No      2         12
 14     A      No      3         12
 18     A      No      3         12
 19     A      No      4         12
 16     B      No      0          4
 9      B      No      1          4
 0      B     Yes      1          4
 4      B     Yes      1          4
 12     B     Yes      1          4
 7      B      No      3          4
 3      B     Yes      4          4
 17     C      No      1          5
 13     C     Yes      1          5
 11     C     Yes      3          5
 15     C      No      4          5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM