[英]Aggregate timeseries per minute data to hourly on large csv files
Hello I have a question about how to aggregate per minute data to an hourly level for each road segment.您好,我有一个关于如何将每个路段的每分钟数据聚合到每小时级别的问题。 The data should be grouped by hour and the road segment ID.
数据应按小时和路段 ID 分组。 Would this be possible to do on 15gb+ csv as I have only filtered only the relevant road segments to reduce the size to 1-2GB?
这是否可以在 15gb+ csv 上执行,因为我只过滤了相关的路段以将大小减小到 1-2GB?
The data set is something like this数据集是这样的
DateTime SegmentID Speed
2019-10-08T01:00:00+01:00 1 39
2019-10-08T01:00:01+01:00 1 39
You can use the resample()
pandas function.您可以使用
resample()
熊猫函数。 Then it looks like you'd want to get the sum() or mean() of the other columns?那么看起来您想要获取其他列的 sum() 或 mean() 吗?
df.resample('H').mean()
If your DateTime
column is not your index you will have to do:如果您的
DateTime
列不是您的索引,您将必须执行以下操作:
df.resample('H', on='DateTime').mean()
You can use other aggregations instead of mean()
or sum()
depending on what you are trying to achieve.您可以使用其他聚合代替
mean()
或sum()
具体取决于您要实现的目标。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.