简体   繁体   English

在大型 csv 文件上将每分钟数据的时间序列聚合到每小时

[英]Aggregate timeseries per minute data to hourly on large csv files

Hello I have a question about how to aggregate per minute data to an hourly level for each road segment.您好,我有一个关于如何将每个路段的每分钟数据聚合到每小时级别的问题。 The data should be grouped by hour and the road segment ID.数据应按小时和路段 ID 分组。 Would this be possible to do on 15gb+ csv as I have only filtered only the relevant road segments to reduce the size to 1-2GB?这是否可以在 15gb+ csv 上执行,因为我只过滤了相关的路段以将大小减小到 1-2GB?

The data set is something like this数据集是这样的

             DateTime              SegmentID    Speed
    2019-10-08T01:00:00+01:00          1          39
    2019-10-08T01:00:01+01:00          1          39

You can use the resample() pandas function.您可以使用resample()熊猫函数。 Then it looks like you'd want to get the sum() or mean() of the other columns?那么看起来您想要获取其他列的 sum() 或 mean() 吗?

df.resample('H').mean()

If your DateTime column is not your index you will have to do:如果您的DateTime列不是您的索引,您将必须执行以下操作:

df.resample('H', on='DateTime').mean()

You can use other aggregations instead of mean() or sum() depending on what you are trying to achieve.您可以使用其他聚合代替mean()sum()具体取决于您要实现的目标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM