简体   繁体   English

熊猫-Groupby累积时间

[英]Pandas - groupby cumulative timeperiod

Here's my problem: Imagine a dataframe indexed by time. 这是我的问题:想象一个按时间索引的数据帧。

df = pd.DataFrame(index=["00:00:00", 
"00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a", 
"b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})

I would now like to apply a function and group the data based on cumulative time in 15 second intervals, ie for timestamps between 00:00:00 - 00:00:15, 00:00:00 - 00:00:30, 00:00:00 - 00:00:45, etc. 我现在想应用一个函数并基于15秒钟的时间间隔内的累积时间对数据进行分组,即对于00:00:00-00:00:15、00:00:00-00:00:30,00之间的时间戳:00:00-00:00:45,依此类推。

Let's say for example, I want to sum all values of col2, col3 and divide one by the other, if the value in col1 is "a" in each of those intervals. 举例来说,如果col1中的值在每个间隔中均为“ a”,则我想对col2,col3的所有值求和。

The output should be something like: 输出应该是这样的:

         output
00:00:15    2
00:00:30    2.3333

Appreciate any help! 感谢任何帮助!

First convert index to timedeltas by to_timedelta and add 15 seconds for shifting it, then filter only a rows by boolean indexing and Series.eq ( == ). 首先通过转换索引timedeltas to_timedelta并添加15 seconds用于移动它,然后过滤仅a行由boolean indexingSeries.eq== )。

Then DataFrame.resample sum , then DataFrame.cumsum and last divide columns by Series.div : 然后是DataFrame.resample sum ,然后是DataFrame.cumsum ,最后按Series.div除以列:

df.index = pd.to_timedelta(df.index) + pd.Timedelta(15, unit='s')

df = df[df['col1'].eq('a')].resample('15S').sum().cumsum()
df['out'] = df['col2'].div(df['col3'])
print (df)
          col2  col3       out
00:00:15     8     4  2.000000
00:00:30    14     6  2.333333

Alternative is converting to datetime s: 替代方法是转换为datetime

df.index = pd.to_datetime(df.index) + pd.Timedelta(15, unit='s')

df = df[df['col1'].eq('a')].resample('15S').sum().cumsum()
df['out'] = df['col2'].div(df['col3'])
print (df)
                     col2  col3       out
2019-03-21 00:00:15     8     4  2.000000
2019-03-21 00:00:30    14     6  2.333333
df = pd.DataFrame(index=["00:00:00", "00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a","b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
df.index = pd.to_datetime(df.index, format='%H:%M:%S')
df = df[df['col1']=='a'].resample('15s', how='sum').cumsum()
df['output'] = df['col2']/df['col3']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM