简体   繁体   English

分钟平均CSV文件

[英]minute average csv files

I have a big csv file with datetime and value recorded every 10 seconds. 我有一个很大的csv文件,其日期时间和值每10秒记录一次。 The csv file looks like this: csv文件如下所示:

Datetime             Data  
2008-10-01 12:00:10, 34  
2008-10-01 12:00:20, 55  
2008-10-01 12:00:30, 46  
2008-10-01 12:00:40, 33  
2008-10-01 12:00:50, 55  
2008-10-01 12:01:00, 21  
2008-10-01 12:01:10, 2  
2008-10-01 12:01:20, 34  
2008-10-01 12:01:30, 521  
2008-10-01 12:01:40, 45  
2008-10-01 12:01:50, 32  
2008-10-01 12:02:00, 34

I want to write a script that would calculate minute average and write it in a new csv file giving the following output: 我想编写一个脚本,该脚本将计算平均分钟数并将其写入提供以下输出的新csv文件中:

Datetime             Data  
2008-10-01 12:00:00, 40.67  
2008-10-01 12:01:00, 111.33

Any idea how this can be done and any suggestions about modules that I should look into or any examples. 关于如何完成此操作的任何想法,以及我应该研究的有关模块的任何建议或任何示例。

It seems to me the easiest way is just to treat the time as a string, rather than a time, and use itertools.groupby : 在我看来,最简单的方法是将时间视为字符串,而不是时间,并使用itertools.groupby

from csv import reader
from itertools import groupby

lines = """Datetime             Data
2008-10-01 12:00:10, 34
2008-10-01 12:00:20, 55
2008-10-01 12:00:30, 46
2008-10-01 12:00:40, 33
2008-10-01 12:00:50, 55
2008-10-01 12:01:00, 21
2008-10-01 12:01:10, 2
2008-10-01 12:01:20, 34
2008-10-01 12:01:30, 521
2008-10-01 12:01:40, 45
2008-10-01 12:01:50, 32
2008-10-01 12:02:00, 34"""

lines = iter(lines.splitlines())

# above this is just for testing, really you'd do
# with open('filename', 'rb') as lines:
# and indent the rest

next(lines)

for minute, group in groupby(reader(lines), lambda row: row[0][:16]):
    group = list(group)
    print minute, sum(float(row[1]) for row in group) / len(group)

Use the csv.reader to parse the file and a dictionary to cluster the results. 使用csv.reader解析文件,并使用字典对结果进行聚类。 The str.rpartition method can split-off the seconds. str.rpartition方法可以拆分秒数。 Use sum and len to compute the average: 使用sumlen计算平均值:

data = '''\
2008-10-01 12:00:10, 34  
2008-10-01 12:00:20, 55  
2008-10-01 12:00:30, 46  
2008-10-01 12:00:40, 33  
2008-10-01 12:00:50, 55  
2008-10-01 12:01:00, 21  
2008-10-01 12:01:10, 2  
2008-10-01 12:01:20, 34  
2008-10-01 12:01:30, 521  
2008-10-01 12:01:40, 45  
2008-10-01 12:01:50, 32  
2008-10-01 12:02:00, 34
'''.splitlines()

import csv

d = {}
for timestamp, value in csv.reader(data):
    minute, colon, second = timestamp.rpartition(':')
    if minute not in d:
        d[minute] = [float(value)]
    else:
        d[minute].append(float(value))

for minute, values in sorted(d.items()):
    avg_value = sum(values) / len(values)
    print minute + ',' + str(avg_value)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM