I am combining a bunch of different datasets to create an aggregation to analyse in 15 minute intervals.
The currently dataframe I have looks like this,
<bound method NDFrame.to_clipboard of id user_id sentiment magnitude \
2020-10-04 14:06:00 10.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.1 0.1
2020-10-04 14:06:05 11.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.8 0.8
2020-10-05 12:28:58 12.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.2 0.2
2020-10-05 12:29:16 13.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 -0.2 0.2
2020-10-05 12:29:31 14.0 cPL1Fg7BqRXvSFKeU1mJT7KCCTq2 0.2 0.2
angry disgusted fearful happy neutral sad \
2020-10-04 14:06:00 NaN NaN NaN NaN NaN NaN
2020-10-04 14:06:05 NaN NaN NaN NaN NaN NaN
2020-10-05 12:28:58 NaN NaN NaN NaN NaN NaN
2020-10-05 12:29:16 NaN NaN NaN NaN NaN NaN
2020-10-05 12:29:31 NaN NaN NaN NaN NaN NaN
surprised heartRate steps
2020-10-04 14:06:00 NaN NaN NaN
2020-10-04 14:06:05 NaN NaN NaN
2020-10-05 12:28:58 NaN NaN NaN
2020-10-05 12:29:16 NaN NaN NaN
2020-10-05 12:29:31 NaN NaN NaN >
I want to aggregate the dataframe into 15 minute intervals.
I think groupby is the best approach? But I'm finding it hard to get it to work particularly well: /
Thanks in advance,
There are two options, either we can use resample or pd.Grouper(which is performant).
Let me share example of pd.Grouper to add column values for 15 mins interval.
Code
pd.DataFrame(df.groupby([pd.Grouper(key='date', freq='15Min')]).sum()).reset_index()
Input sample from your data
date id
0 2020-10-04 14:06:00 10.0
1 2020-10-04 14:06:05 11.0
2 2020-10-05 12:28:58 12.0
3 2020-10-05 12:29:16 13.0
4 2020-10-05 12:29:31 14.0
Output
date id
0 2020-10-04 14:00:00 21.0
1 2020-10-04 14:15:00 0.0
2 2020-10-04 14:30:00 0.0
3 2020-10-04 14:45:00 0.0
4 2020-10-04 15:00:00 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.