Here's my problem: Imagine a dataframe indexed by time.
df = pd.DataFrame(index=["00:00:00",
"00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a",
"b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
I would now like to apply a function and group the data based on cumulative time in 15 second intervals, ie for timestamps between 00:00:00 - 00:00:15, 00:00:00 - 00:00:30, 00:00:00 - 00:00:45, etc.
Let's say for example, I want to sum all values of col2, col3 and divide one by the other, if the value in col1 is "a" in each of those intervals.
The output should be something like:
output
00:00:15 2
00:00:30 2.3333
Appreciate any help!
First convert index to timedeltas by to_timedelta
and add 15 seconds
for shifting it, then filter only a
rows by boolean indexing
and Series.eq
( ==
).
Then DataFrame.resample
sum
, then DataFrame.cumsum
and last divide columns by Series.div
:
df.index = pd.to_timedelta(df.index) + pd.Timedelta(15, unit='s')
df = df[df['col1'].eq('a')].resample('15S').sum().cumsum()
df['out'] = df['col2'].div(df['col3'])
print (df)
col2 col3 out
00:00:15 8 4 2.000000
00:00:30 14 6 2.333333
Alternative is converting to datetime
s:
df.index = pd.to_datetime(df.index) + pd.Timedelta(15, unit='s')
df = df[df['col1'].eq('a')].resample('15S').sum().cumsum()
df['out'] = df['col2'].div(df['col3'])
print (df)
col2 col3 out
2019-03-21 00:00:15 8 4 2.000000
2019-03-21 00:00:30 14 6 2.333333
df = pd.DataFrame(index=["00:00:00", "00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a","b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
df.index = pd.to_datetime(df.index, format='%H:%M:%S')
df = df[df['col1']=='a'].resample('15s', how='sum').cumsum()
df['output'] = df['col2']/df['col3']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.