繁体   English   中英

读取csv文件以解析日期

[英]reading a csv file to parse dates

我有一个.csv文件,其中包含以下数据:

equipement,"144444444"
Date,"Time","measure"
16/09/2016,"07:15:00","16.47777"
16/09/2016,"07:30:00","15.44454"
16/09/2016,"07:45:00","16.21114"

我在此文件上运行python代码,我的目标是将类似以下内容作为输出:

"measure","20160916071500","16.47777"
"measure","20160916073000","15.44454"
"measure","20160916074500","16.21114"

这是我的代码:

import csv
import sys
import os
import re
import fnmatch
import csv
from dateutil.parser import parse as parseDate
from datetime import datetime, time, timedelta


    file = open("myfile.csv", 'rt')
    reader = csv.reader(file)
    next(reader)
    rows = list(reader)
    firstline = rows[0]
    header = firstline[2]
    print header
    for row in rows:
        next(reader)
        print rows[0]
        if "".join(row).strip() != "":
            chaine = str(row[0]+row[1])
            #print chaine
            date = chaine[:10] + " " + chaine[11:]
            #print date
            index = parseDate(date)
            index = str(index).replace('-','')
            index = str(index).replace(':','')
            index = str(index).replace(' ','')
            data = row[2]

我的问题是,我需要执行next(reader)来跳过文件中的第一行和第二行,因为它们不包含任何日期。 但我得到这个错误:

Traceback (most recent call last): File "t.py", line 19, in <module> next(reader) StopIteration

任何想法?

通过执行rows = list(reader) ,您已经用尽了reader ,并将结果收集在一个名为rows的列表中。 再次执行next(reader)将提高StopIteration

但是,不必创建rows列表。 您可以使用for循环直接遍历reader

reader = csv.reader(file)
next(reader)               # skip first line
secondline = next(reader)  # capture second line
header = secondline[2]
for row in reader:         # iterate from third line to the end
    # next(reader) <-- don't do this, the for loop already does it for you
    if "".join(row).strip() != "":
        # ... your code processing row ...

如果您愿意,可以用熊猫解决它:

import pandas as pd

df = pd.read_csv('in.csv', skiprows=2, header=None, parse_dates=[[0,1]])
df['dt']=df["0_1"].apply(lambda x: x.strftime('%Y%m%d%H%M%S'))
df['mes'] = pd.Series(["measure"]*len(df), index=df.index)
df[['mes','dt',2]].to_csv('out.csv', quoting=True, index=None,header=None)

CSV档案:

"measure","20160916071500","16.47777"
"measure","20160916073000","15.44454"
"measure","20160916074500","16.21114"

您可以仅使用two for loops和一些字符串替换来获得相同的期望输出,例如以下示例(我假设您的输入名为in.csv ):

data = list(k.strip("\n") for k in open("in.csv", 'r'))
mesure = data[1].split(",")[2]
m = list(k.replace('"', "").split(",") for k in data[2:])

final, d =[], ""
for k in m:
    for j in k[:-1]:
        if "/" in j:
            d = '"%s' % "".join(j.split("/")[::-1])
        if ":" in j:
            d += '%s"' % "".join(j.split(":"))
    final.append(",".join([mesure, d,'"%s"' % k[-1:][0]]))

for k in final:
    print(k)

输出:

"measure","20160916071500","16.47777"
"measure","20160916073000","15.44454"
"measure","20160916074500","16.21114"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM