简体   繁体   中英

Pandas sum rows by group based on condition

I have weekly data grouped by region. I'm trying to figure out how to sum a set of rows based on a condition for each region. For example:

Region  | Week | Year | value
------------------------------
  R1    |   53 | 2016 | 10
  R1    |    1 | 2017 | 8
  R2    |   53 | 2017 | 10
  R2    |    1 | 2018 | 17
  R3    |   53 | 2018 | 30
  R3    |    1 | 2019 | 1

I would like add every value of week 53 from the previous year to the first week of the following year to turn it into:

Region  | Week | Year | value
------------------------------
  R1    |    1 | 2017 | 18
  R2    |    1 | 2018 | 27
  R3    |    1 | 2019 | 31

Thanks.

agg can be very useful here. Try this:

df = df.groupby('Region', as_index=False).agg({'Year':'max', 'value':'sum'})

Output:

>>> df
  Region  Year  value
0     R1  2017     18
1     R2  2018     27
2     R3  2019     31

Format Year and week of the year to be able to convert into date.

Extract the time components and proceed to groupby and sum

 s=pd.to_datetime(df.Year * 1000 + df.Week * 10 + 0, format='%Y%W%w')
df=(df.assign(Year=np.where(df['Week']==53,s.dt.year, df['Year']),
             Week=np.where(df['Week']==53,s.dt.isocalendar().week, df['Week']))
        .groupby(['Region', 'Year', 'Week']).agg('sum'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM