[英]Pandas - groupby cumulative timeperiod
Here's my problem: Imagine a dataframe indexed by time. 这是我的问题:想象一个按时间索引的数据帧。
df = pd.DataFrame(index=["00:00:00",
"00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a",
"b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
I would now like to apply a function and group the data based on cumulative time in 15 second intervals, ie for timestamps between 00:00:00 - 00:00:15, 00:00:00 - 00:00:30, 00:00:00 - 00:00:45, etc. 我现在想应用一个函数并基于15秒钟的时间间隔内的累积时间对数据进行分组,即对于00:00:00-00:00:15、00:00:00-00:00:30,00之间的时间戳:00:00-00:00:45,依此类推。
Let's say for example, I want to sum all values of col2, col3 and divide one by the other, if the value in col1 is "a" in each of those intervals. 举例来说,如果col1中的值在每个间隔中均为“ a”,则我想对col2,col3的所有值求和。
The output should be something like: 输出应该是这样的:
output
00:00:15 2
00:00:30 2.3333
Appreciate any help! 感谢任何帮助!
First convert index to timedeltas by to_timedelta
and add 15 seconds
for shifting it, then filter only a
rows by boolean indexing
and Series.eq
( ==
). 首先通过转换索引timedeltas to_timedelta
并添加15 seconds
用于移动它,然后过滤仅a
行由boolean indexing
和Series.eq
( ==
)。
Then DataFrame.resample
sum
, then DataFrame.cumsum
and last divide columns by Series.div
: 然后是DataFrame.resample
sum
,然后是DataFrame.cumsum
,最后按Series.div
除以列:
df.index = pd.to_timedelta(df.index) + pd.Timedelta(15, unit='s')
df = df[df['col1'].eq('a')].resample('15S').sum().cumsum()
df['out'] = df['col2'].div(df['col3'])
print (df)
col2 col3 out
00:00:15 8 4 2.000000
00:00:30 14 6 2.333333
Alternative is converting to datetime
s: 替代方法是转换为datetime
:
df.index = pd.to_datetime(df.index) + pd.Timedelta(15, unit='s')
df = df[df['col1'].eq('a')].resample('15S').sum().cumsum()
df['out'] = df['col2'].div(df['col3'])
print (df)
col2 col3 out
2019-03-21 00:00:15 8 4 2.000000
2019-03-21 00:00:30 14 6 2.333333
df = pd.DataFrame(index=["00:00:00", "00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a","b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
df.index = pd.to_datetime(df.index, format='%H:%M:%S')
df = df[df['col1']=='a'].resample('15s', how='sum').cumsum()
df['output'] = df['col2']/df['col3']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.