简体   繁体   中英

sum values in column grouped by another column pandas

My df looks like this:

country   id       x       y
AT        11      50     100
AT        12      NaN     90
AT        13      NaN    104
AT        22      40      50
AT        23      30      23
AT        61      40      88
AT        62      NaN     78  
UK        11      40      34
UK        12      NaN     22
UK        13      NaN     70

What I need is the sum of the y column in the first row that is not NaN in x, grouped by the first number on the left of the column id. This separately for each country. At the end I just need to drop the NaN.

The result should be something like this:

country   id       x       y
AT        11      50     294
AT        22      40      50
AT        23      30      23
AT        61      40     166
UK        11      40      126

You can aggregate by GroupBy.agg by first and sum functions with helper Series by compare non missing values by Series.notna and cumulative sum by Series.cumsum :

df1 = (df.groupby(['country', df['x'].notna().cumsum()])
         .agg({'id':'first', 'x':'first', 'y':'sum'})
         .reset_index(level=1, drop=True)
         .reset_index())
print (df1)
  country  id     x    y
0      AT  11  50.0  294
1      AT  22  40.0   50
2      AT  23  30.0   23
3      AT  61  40.0  166
4      UK  11  40.0  126

If possible first value(s) of x are misisng values add DataFrame.dropna :

print (df)
   country  id     x    y
0       AT  11   NaN  100
1       AT  11  50.0  100
2       AT  12   NaN   90
3       AT  13   NaN  104
4       AT  22  40.0   50
5       AT  23  30.0   23
6       AT  61  40.0   88
7       AT  62   NaN   78
8       UK  11  40.0   34
9       UK  12   NaN   22
10      UK  13   NaN   70

df1 = (df.groupby(['country', df['x'].notna().cumsum()])
         .agg({'id':'first', 'x':'first', 'y':'sum'})
         .reset_index(level=1, drop=True)
         .reset_index()
         .dropna(subset=['x']))
print (df1)
  country  id     x    y
1      AT  11  50.0  294
2      AT  22  40.0   50
3      AT  23  30.0   23
4      AT  61  40.0  166
5      UK  11  40.0  126

Use groupby , transform and dropna :

print (df.assign(y=df.groupby(df["x"].notnull().cumsum())["y"].transform('sum'))
         .dropna(subset=["x"]))

  country  id     x    y
0      AT  11  50.0  294
3      AT  22  40.0   50
4      AT  23  30.0   23
5      AT  61  40.0  166
7      UK  11  40.0  126

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM