[英]Reading and calculation using csv
I'm new to python and pardon me if this question might sound silly - 我是python的新手,如果这个问题听起来很傻,请原谅我-
I have csv file that has 2 columns - Value and Timestamp
. 我有csv文件,其中有2列Value and Timestamp
。 I'm trying to write a code that would take 2 paramenters - start_date
and end_date
and traverse the csv file to obtain all the values between those 2 dates and print the sum of Value
我正在尝试编写一个将使用2个参数的代码start_date
和end_date
并遍历csv文件以获取这2个日期之间的所有值并打印Value
的总和
Below is my code. 下面是我的代码。 I'm trying to read and store the values in a list. 我正在尝试读取值并将其存储在列表中。
f_in = open('Users2.csv').readlines()
Value1 = []
Created = []
for i in range(1, len(f_in)):
Value, created_date = f_in[i].split(',')
Value1.append(Value)
Created.append(created_date)
print Value1
print Created
My csv has the following format 我的csv具有以下格式
10 2010-02-12 23:31:40
20 2010-10-02 23:28:11
40 2011-03-12 23:39:40
10 2013-09-10 23:29:34
420 2013-11-19 23:26:17
122 2014-01-01 23:41:51
When I run my code - File1.py
as below 当我运行代码时-如下所示的File1.py
File1.py 2010-01-01 2011-03-31
The output should be 70
输出应为70
I'm running into the following issues - 我遇到以下问题-
You can try this: 您可以尝试以下方法:
import csv
data = csv.reader(open('filename.csv'))
start_date = 10
end_data = 30
times = [' '.join(i) for i in data if int(i[0]) in range(start_date, end_date)]
Since you said that dates are in timestamp, you can compare them like strings. 由于您说过的日期带有时间戳,因此可以像字符串一样比较它们。 By realizing that, what you want to achieve (sum the value
s if created
is between start_date
and end_date
) can be done like this: 通过认识到,要实现什么(求和value
■如果created
介于start_date
和end_date
)可以这样做:
def sum_values(start_date, end_date):
sum = 0
with open('Users2.csv') as f:
for line in f:
value, created = line.split(' ', 1)
if created > start_date && created < end_date:
sum += int(value)
return sum
str.split(' ', 1)
will split on ' '
but will stop splitting after 1 split has been done. str.split(' ', 1)
将在' '
上分割,但在完成1个分割后将停止分割。 start_date
and end_date
must be in format yyyy-MM-dd hh:mm:ss
which I assume they are, cause they are in timestamp format. start_date
和end_date
必须采用yyyy-MM-dd hh:mm:ss
格式,我认为它们是时间戳格式,因为它们是时间戳格式。 Just mind it. 随便吧。
Depends on your file size, but you may consider putting values from csv
file, into some database, and then query your results. 根据您的文件大小,但是您可以考虑将csv
文件中的值放入某个数据库中,然后查询结果。
csv
module has DictReader
which allows you to predefine your column names, it greatly improves readability, specially while working on really big files. csv
模块具有DictReader
,它允许您预定义列名,从而极大地提高了可读性,特别是在处理非常大的文件时。
from datetime import datetime
COLUMN_NAMES = ['value', 'timestamp']
def sum_values(start_date, end_date):
sum = 0
with open('Users2.csv', mode='r') as csvfile:
table = csv.DictReader(csvfile, fieldnames=COLUMN_NAMES)
for row in table:
if row['timestamp'] >= min_date and row['timestamp'] <= max_date:
sum += int(row['value'])
return sum
If you are open to using pandas
, try this: 如果您愿意使用pandas
,请尝试以下操作:
>>> import pandas as pd
>>> data = 'Users2.csv'
>>>
>>> dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
>>> df = pd.read_csv(data, names=['value', 'date'], parse_dates=['date'], date_parser=dateparse)
>>> result = df['value'][(df['date'] > '2010-01-01') &
... (df['date'] < '2011-03-31')
... ].sum()
>>> result
70
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.