sum values in column grouped by another column pandas

Question

My df looks like this:

country   id       x       y
AT        11      50     100
AT        12      NaN     90
AT        13      NaN    104
AT        22      40      50
AT        23      30      23
AT        61      40      88
AT        62      NaN     78  
UK        11      40      34
UK        12      NaN     22
UK        13      NaN     70

What I need is the sum of the y column in the first row that is not NaN in x, grouped by the first number on the left of the column id. This separately for each country. At the end I just need to drop the NaN.

The result should be something like this:

country   id       x       y
AT        11      50     294
AT        22      40      50
AT        23      30      23
AT        61      40     166
UK        11      40      126

Answer 1

You can aggregate by GroupBy.agg by first and sum functions with helper Series by compare non missing values by Series.notna and cumulative sum by Series.cumsum :

df1 = (df.groupby(['country', df['x'].notna().cumsum()])
         .agg({'id':'first', 'x':'first', 'y':'sum'})
         .reset_index(level=1, drop=True)
         .reset_index())
print (df1)
  country  id     x    y
0      AT  11  50.0  294
1      AT  22  40.0   50
2      AT  23  30.0   23
3      AT  61  40.0  166
4      UK  11  40.0  126

If possible first value(s) of x are misisng values add DataFrame.dropna :

print (df)
   country  id     x    y
0       AT  11   NaN  100
1       AT  11  50.0  100
2       AT  12   NaN   90
3       AT  13   NaN  104
4       AT  22  40.0   50
5       AT  23  30.0   23
6       AT  61  40.0   88
7       AT  62   NaN   78
8       UK  11  40.0   34
9       UK  12   NaN   22
10      UK  13   NaN   70

df1 = (df.groupby(['country', df['x'].notna().cumsum()])
         .agg({'id':'first', 'x':'first', 'y':'sum'})
         .reset_index(level=1, drop=True)
         .reset_index()
         .dropna(subset=['x']))
print (df1)
  country  id     x    y
1      AT  11  50.0  294
2      AT  22  40.0   50
3      AT  23  30.0   23
4      AT  61  40.0  166
5      UK  11  40.0  126

Answer 2

Use groupby , transform and dropna :

print (df.assign(y=df.groupby(df["x"].notnull().cumsum())["y"].transform('sum'))
         .dropna(subset=["x"]))

  country  id     x    y
0      AT  11  50.0  294
3      AT  22  40.0   50
4      AT  23  30.0   23
5      AT  61  40.0  166
7      UK  11  40.0  126

sum values in column grouped by another column pandas

Question

2 answers

solution1
2 2020-04-15 07:43:46

solution2
0 ACCPTED 2020-04-15 07:49:48

sum values in column grouped by another column pandas

Question

2 answers

solution1 2 2020-04-15 07:43:46

solution2 0 ACCPTED 2020-04-15 07:49:48

solution1
2 2020-04-15 07:43:46

solution2
0 ACCPTED 2020-04-15 07:49:48