简体   繁体   中英

Python pandas group rows by date

I have attributes like color, size etc that I want to group by the start_date and end_date and aggregate by id and attribute values

id color  size start_date end_date
A1 blue   m    1/1/2022   3/1/2022
A1 blue   l    3/1/2022   5/1/2022
A1 yellow l    5/1/2022   NaN
A1 blue   1/1/2022 5/1/2022
A1 yellow 5/1/2022 NaN
A1 m 1/1/2022 3/1/2022
A1 l 3/1/2022 NaN

The default .groupby method will ignore nans , but passing dropna=False will solve it:

for group_label, group_df in df.groupby(['start_date', 'end_date'], dropna=False):
    ...
df.fillna('NaN').groupby('size').agg({'id':'first', 'start_date':'first', 'end_date':'last'}).reset_index()
 
  size  id start_date  end_date
0    l  A1   3/1/2022       NaN
1    m  A1   1/1/2022  3/1/2022

df.fillna('NaN').groupby('color').agg({'id':'first', 'start_date':'first', 'end_date':'last'}).reset_index()
 
    color  id start_date  end_date
0    blue  A1   1/1/2022  5/1/2022
1  yellow  A1   5/1/2022       NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM