简体   繁体   English

使用csv读取和计算

[英]Reading and calculation using csv

I'm new to python and pardon me if this question might sound silly - 我是python的新手,如果这个问题听起来很傻,请原谅我-

I have csv file that has 2 columns - Value and Timestamp . 我有csv文件,其中有2列Value and Timestamp I'm trying to write a code that would take 2 paramenters - start_date and end_date and traverse the csv file to obtain all the values between those 2 dates and print the sum of Value 我正在尝试编写一个将使用2个参数的代码start_dateend_date并遍历csv文件以获取这2个日期之间的所有值并打印Value的总和

Below is my code. 下面是我的代码。 I'm trying to read and store the values in a list. 我正在尝试读取值并将其存储在列表中。

f_in = open('Users2.csv').readlines()
Value1 = []
Created = []
for i in range(1, len(f_in)):
    Value, created_date = f_in[i].split(',')
    Value1.append(Value)
    Created.append(created_date)

print Value1
print Created

My csv has the following format 我的csv具有以下格式

10  2010-02-12 23:31:40
20  2010-10-02 23:28:11
40  2011-03-12 23:39:40
10  2013-09-10 23:29:34
420 2013-11-19 23:26:17
122 2014-01-01 23:41:51

When I run my code - File1.py as below 当我运行代码时-如下所示的File1.py

File1.py 2010-01-01 2011-03-31

The output should be 70 输出应为70

I'm running into the following issues - 我遇到以下问题-

  1. The data in csv is in timestamp (created_date), but the parameter passed should be date and I need to convert and get the data between those 2 dates regardless of time. csv中的数据位于时间戳记(created_date)中,但传递的参数应为date,我需要转换并获取这两个日期之间的数据,而与时间无关。
  2. Once I have it in list - as described above - how do I proceed to do my calculation considering the condition in point-1 一旦将其放入列表中-如上所述-考虑到第1点中的情况,如何继续进行计算

You can try this: 您可以尝试以下方法:

import csv

data = csv.reader(open('filename.csv'))
start_date = 10
end_data = 30

times = [' '.join(i) for i in data if int(i[0]) in range(start_date, end_date)]

Since you said that dates are in timestamp, you can compare them like strings. 由于您说过的日期带有时间戳,因此可以像字符串一样比较它们。 By realizing that, what you want to achieve (sum the value s if created is between start_date and end_date ) can be done like this: 通过认识到,要实现什么(求和value ■如果created介于start_dateend_date )可以这样做:

def sum_values(start_date, end_date):
    sum = 0
    with open('Users2.csv') as f:
        for line in f:
            value, created = line.split(' ', 1)
            if created > start_date && created < end_date:
                sum += int(value)
    return sum

str.split(' ', 1) will split on ' ' but will stop splitting after 1 split has been done. str.split(' ', 1)将在' '上分割,但在完成1个分割后将停止分割。 start_date and end_date must be in format yyyy-MM-dd hh:mm:ss which I assume they are, cause they are in timestamp format. start_dateend_date必须采用yyyy-MM-dd hh:mm:ss格式,我认为它们是时间戳格式,因为它们是时间戳格式。 Just mind it. 随便吧。

Depends on your file size, but you may consider putting values from csv file, into some database, and then query your results. 根据您的文件大小,但是您可以考虑将csv文件中的值放入某个数据库中,然后查询结果。

csv module has DictReader which allows you to predefine your column names, it greatly improves readability, specially while working on really big files. csv模块具有DictReader ,它允许您预定义列名,从而极大地提高了可读性,特别是在处理非常大的文件时。

from datetime import datetime

COLUMN_NAMES = ['value', 'timestamp']


def sum_values(start_date, end_date):
    sum = 0

    with open('Users2.csv', mode='r') as csvfile:
        table = csv.DictReader(csvfile, fieldnames=COLUMN_NAMES)
        for row in table:
            if row['timestamp'] >= min_date and row['timestamp'] <= max_date:
                sum += int(row['value'])
    return sum

If you are open to using pandas , try this: 如果您愿意使用pandas ,请尝试以下操作:

>>> import pandas as pd
>>> data = 'Users2.csv'
>>>
>>> dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
>>> df = pd.read_csv(data, names=['value', 'date'], parse_dates=['date'], date_parser=dateparse)

>>> result = df['value'][(df['date'] > '2010-01-01') &
...                      (df['date'] < '2011-03-31')
...                 ].sum()
>>> result
70

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM