I have a data frame that looks something like this:
df =
date name val1 val2
-----------------------------------
14:55:00 name1 1 2
14:55:00 name1 2 4
15:00:00 name2 3 6
15:00:00 name3 4 8
15:05:00 name4 5 10
15:05:00 name5 6 12
What I would like to do is aggregate the data if the dates are the same - but only if the name is different. So the above data frame should actually become:
df_new =
date name val1 val2
-----------------------------------------
15:00:00 name2+name3 7 14
15:05:00 name4+name5 11 22
Currently I am almost doing it with:
df_new = df.groupby("date", as_index=False).agg({"name" : "+".join, "val1" : "sum", "val2" : "sum"})
However, this will also aggregate the ones where the name
is the same, which it shouldn't. EDIT: It should also be noted that there are only a few different names. The names will be repeated in each date-interval. It's just that when the dates are aggregated the names can't be the same.
Can this be fixed?
Look for the duplicates, drop them, and then aggregate on the date column:
(df.drop_duplicates(subset=['date', 'name'],
keep=False)
.groupby('date')
.sum()
)
val1 val2
date
15:00:00 7 14
15:05:00 11 22
You can use:
(df.loc[~df.duplicated(subset=['date', 'name'], keep=False)]
.groupby('date', as_index=False)
.agg({"name" : "+".join, "val1" : "sum", "val2" : "sum"})
)
date name val1 val2
0 15:00:00 name2+name3 7 14
1 15:05:00 name4+name5 11 22
Here, we firstly get rid of those entries that we don't want to aggregate: same date and name. We still keep the duplicated names if the dates are different.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.