简体   繁体   English

使用 Python/Pandas 以 csv 中的开始时间和结束时间日期时间列按小时分组

[英]Group by hour with start time and end time datetime columns in csv with Python/Pandas

I'm just getting my toes wet in Pandas and gotten pretty stuck.我只是在 Pandas 中弄湿了我的脚趾并被卡住了。 I want to aggregate events (get the count) in a CSV by hour and have a start time and and end time in the event.我想按小时在 CSV 中聚合事件(获取计数),并在事件中有开始时间和结束时间。

ie an example would be:即一个例子是:

event, start, end
soccer, 2020-01-20 00:34:00, 2020-01-20 02:34:00,
football, 2020-01-20 00:34:00, 2020-01-20 01:34:00
etc

expected output:预期输出:

00:00:00 - 2 (both began in 0th hour and went to 1st hour)
01:00:00 - 2 (both were live in 1st hour)
02:00:00 - 1 (only soccer occurred in 02 hour)

How would you go about this?你会怎么做? I've been trying reindexing, resampling, time difference, time indexes — all with no luck.我一直在尝试重新索引、重新采样、时差、时间索引——但都没有成功。

What you want is effectively a frequency distribution of hours during which events are taking place.您想要的实际上是事件发生时间的频率分布。 First, you need to generate the samples from which to take the distribution by creating a range and then exploding it:首先,您需要通过创建一个范围然后分解它来生成从中获取分布的样本:

hours = events.apply(lambda row: range(row['end'].hour - row['start'].hour + 1), axis=1).explode()

0    0
0    1
0    2
1    0
1    1
dtype: object

Don't forget to add one to the difference between end and start to account for fencepost error .不要忘记在 end 和 start 之间的差异中添加一个以解决fencepost error Then just get value counts for the sample.然后只需获取样本的值计数。 To get the frequency in order of hours instead of by descending count, pass sort=False .要按小时而不是降序获取频率,请传递sort=False

hours.value_counts(sort=False)

0    2
1    2
2    1
dtype: int64

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:从 CSV 文件中的开始时间列和结束时间列获取总小时数 - Python: Get the total hour from start time column and end time column in CSV file 如何将 pandas 日期时间 object 的时间更改为小时的开始? - How to change the time of a pandas datetime object to the start of the hour? 如何查看datetime是否在熊猫中不同数据框的开始时间和结束时间之间 - how to see if datetime is between start and end time of different dataframe in pandas 具有开始和结束时间到日期时间的字符串 - Strings with start and end time to datetime 使用 Python (pandas, datetime) 在 dataframe 中查找事件(具有开始和结束时间)是否超过特定时间(例如下午 6 点) - Find whether an event (with a start & end time) goes beyond a certain time (e.g. 6pm) in dataframe using Python (pandas, datetime) 如何在python中按类查找事件组的开始时间和结束时间? - How to find the start time and end time of an event group by class in python? Pandas:将日期和时间列作为一个日期时间列的 read_csv - Pandas: read_csv with date and time columns as one datetime column 整理出不同组中的数据并计算熊猫的开始时间和结束时间 - Sort out the data in different group and calculate the start time & end in Pandas 如何按组计算熊猫的开始时间和结束时间之间的时差? - How to calculate time difference between start time and end time by group in pandas? 从开始日期和时间开始的python / pandas,创建日期时间索引 - python/pandas from start date + time, create datetime index
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM