简体   繁体   English

熊猫:与TimeGrouper合作

[英]Pandas: group with TimeGrouper

I have data 我有数据

i,ID,url,used_at,active_seconds,domain,search_term  
322015,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-12-31 09:16:05,35,vk.com,None    
838267,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed,2015-12-31 09:16:38,54,vk.com,None  
838271,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:17:32,34,vk.com,None   
322026,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos&z=photo143297356_397216312%2Ffeed1_143297356_1451504298,2015-12-31 09:18:06,4,vk.com,None    
838275,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:18:10,4,vk.com,None    
322028,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=comments,2015-12-31 09:18:14,8,vk.com,None  
322029,0120bc30e78ba5582617a9f3d6dfd8ca,megarand.ru/contest/121070,2015-12-31 09:18:22,16,megarand.ru,None  
1870917,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=comments,2015-12-31 09:18:38,6,vk.com,None 
1354612,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-12-31 09:18:44,56,vk.com,None   

I need to group by ID , and next groupby used_at , where difference between 2 strings more than 500 second . 我需要按ID进行分组,然后使用used_at ,其中2个字符串之间的差异超过500 second I try 我试试

df.groupby([df['ID', 'used_at'],pd.TimeGrouper(freq='5Min')])

But it returns KeyError: ('ID', 'used_at') 但它返回KeyError: ('ID', 'used_at')

IIUC you need: 您需要的IIUC:

print (df.groupby('ID')['used_at'].diff().dt.seconds)
0     NaN
1    33.0
2    54.0
3    34.0
4     4.0
5     4.0
6     8.0
7    16.0
8     6.0
Name: used_at, dtype: float64

If you wish to use TimeGrouper , you should first set a Datetimeindex and then you can use any aggregating function - eg sum : 如果你想使用TimeGrouper ,你应该首先设置一个Datetimeindex然后你可以使用任何聚合函数 - 例如sum

df['used_at'] = pd.to_datetime(df.used_at)
df.set_index('used_at', inplace=True)
print (df.groupby([df['ID'],pd.TimeGrouper(freq='5Min')]).sum())

Another way to do it is to copy the column used_at to index : 另一种方法是将used_atused_atindex

df['used_at'] = pd.to_datetime(df.used_at)
df.set_index(df['used_at'], inplace=True)
print (df.groupby([df['ID'], df['used_at'],pd.TimeGrouper(freq='5Min')]).sum())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM