简体   繁体   English

如何将数据分组到 5 分钟的用户 bin 中并随后计算记录?

[英]How do I group the data into 5min user bins and subsequently count the records?

So I have a data frame containing timestamps:所以我有一个包含时间戳的数据框:

new_date            id  
-------------------  ----  
2021-03-22 00:12:29 164616
2021-03-22 00:11:51 297284
2021-03-22 00:11:19 148817
2021-03-22 00:11:19 139208
2021-03-22 00:10:29 301459
2021-03-22 00:09:48 299543
2021-03-22 00:09:12 302444

I want to split the bins into 5 mins intervals and add together the number of ids of active users that fits withtin the bins.我想将垃圾箱分成 5 分钟的间隔,并将适合垃圾箱的活动用户的 ID 数加在一起。

new_date            id  
-------------------  ----  
2021-03-22 00:20:00 0
2021-03-22 00:15:00 13
2021-03-22 00:10:00 5
2021-03-22 00:05:00 2

so far I have tried到目前为止我已经尝试过

date["new_dates"] = pd.to_datetime(date['\tgp:last_session_date'], errors='coerce')
date = date.drop('\tgp:last_session_date', 1)
date.dropna()
df.groupby(pd.Grouper(key ="new_dates", freq = '5Min')).agg({"\tuser_id": "count"})

But it gives a weird output with different dates.....但它给出了一个奇怪的 output 与不同的日期.....

2021-02-24 18:45:00 1
2021-02-24 18:50:00 0
2021-02-24 18:55:00 0
2021-02-24 19:00:00 0
2021-02-24 19:05:00 0

I think ouput is expected, if there is some 'lost' datetime near 2021-02-24 18:45:00 .我认为输出是预期的,如果在2021-02-24 18:45:00附近有一些'lost'日期时间。

You can sorting original data for see it:您可以对原始数据进行排序以查看它:

df = df.sort_values('new_date')

So then this row is count for 1 and next values are 0 , because not exist this datetimes in data (and ouput is consecutive DatetimeIndex)因此,此行计数为1 ,下一个值为0 ,因为数据中不存在此日期时间(并且输出是连续的 DatetimeIndex)

EDIT:编辑:

If need remove NaNs is necessary return back ouput for DataFrame.dropna , else not working (or use alternative):如果需要删除 NaN,则必须返回DataFrame.dropna的输出,否则不起作用(或使用替代方法):

date["new_dates"] = pd.to_datetime(date['\tgp:last_session_date'], errors='coerce')
date = date.drop('\tgp:last_session_date', 1)
date = date.dropna()
#alternative
#date.dropna(inplace=True)

df = df.sort_values('new_date')
print (df)
df.groupby(pd.Grouper(key ="new_dates", freq = '5Min')).agg({"\tuser_id": "count"})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 按时间间隔(5 分钟、10 分钟、1 天、1 年)和条目计数分组 - Pandas group by time interval (5min, 10min, 1day, 1year) and count amount of entries 如何使用 celery/cronjobs 每 5 分钟在数据库中添加一个用户 - How to add a user in database in every 5min using celery/cronjobs 如何创建在5分钟间隔内每1分钟发送一次大型电子邮件和小型电子邮件的循环 - How do I create this loop of sending large email and small email every 1min within a 5min interval 如何分组数据和创建箱? - How to group data and create bins? wx python检测用户是否在5分钟内不进行干预,然后执行某些操作 - wx python detect if user didn't intruct for 5min then do something 对每 5 分钟的值进行分组和求和 / 用字符串值重新采样 5 分钟的数据 - Grouping and sum the value for every 5min / resampling the data for 5min with string values 从 1 分钟到 5 分钟的数据重新采样 pandas 时间序列时遇到问题 - Trouble resampling pandas timeseries from 1min to 5min data 如何在1度纬度的箱中对数据进行分组? - how to group data in bins of 1 degree latitude? 使用高频数据(1 或 5 分钟数据)进行时间序列分析和预测 - TimeSeries analysis & forcasting with highfrequence data (1 or 5min data) 如何随后将 function 参数传递给 lambda function - How do I pass a function parameter into a lambda function subsequently
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM