繁体   English   中英

在python中使用datetime解析csv文件以标识唯一的日期

[英]Parsing csv file using datetime in python to identify unique dates

我已经过滤了很长一段时间的天气变量,以删除不符合特定条件的那些变量。 例如,所有数据点仅位于上午11点(11)和下午5点(17pm)之间。 位于11点至17点之间的数据表示单个事件,并非每天都包含一个事件。 我正在尝试确定哪一天有活动。

数据如下所示:

hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
    hd,40842,2000,3,22,13,40,2000,3,22,13,40,2000,3,22,13,40,0,N,20.4,N,20.5,N,20.4,N,20.2,N,20.2,N,20.1,N,20.1,N,20.1,N,20,N,98,N,,N,,N,9,N,8,N,18,N,7,N,11,N,,N,1013.3,N,1012.2,N,1013.3,N,#
    hd,40842,2000,3,22,13,47,2000,3,22,13,47,2000,3,22,13,47,0,N,20.5,N,20.5,N,20.5,N,20.2,N,20.2,N,20.2,N,20.1,N,20.1,N,20,N,97,N,,N,,N,4,N,0,N,56,N,75,N,5,N,,N,1013.2,N,1012.1,N,1013.2,N,#
    hd,40842,2000,3,23,11,0,2000,3,23,11,0,2000,3,23,11,0,0,N,23.4,N,23.4,N,23.3,N,21.3,N,21.4,N,21.3,N,20.2,N,20.3,N,20.2,N,82,N,,N,,N,8,N,5,N,66,N,2,N,9,N,,N,1013.6,N,1012.5,N,1013.6,N,#
    hd,40842,2000,3,23,11,1,2000,3,23,11,1,2000,3,23,11,1,0,N,23.4,N,23.4,N,23.4,N,21.4,N,21.4,N,21.3,N,20.3,N,20.3,N,20.2,N,82,N,,N,,N,8,N,5,N,68,N,3,N,9,N,,N,1013.6,N,1012.5,N,1013.6,N,#

理想情况下,输出文件将具有与上面显示的数据相同的格式,但是只有表示唯一事件的开始和结束的行。 这是我尝试生成将执行此任务的代码的尝试。

import csv
import datetime

with open("X:/weatherresults/final output/weather_out_2000_2006_time_filtered_and_speed_filtered.csv", "rb") as input, open("X:\weatherresults\sea_breeze_dates.csv", "wb") as wanted:
    reader = csv.DictReader(input, delimiter=",", skipinitialspace=True)
    fieldnames = reader.fieldnames
    writer_wanted = csv.DictWriter(wanted, fieldnames, delimiter=",")
    prev_row = None
    for line_number, row in enumerate(reader):
        try:
            dt = datetime.date(year=row["Year Month Day Hours Minutes in YYYY"], month=row["MM"], day=row["DD"])
            if prev_row is not None and dt > prev_row['dt']:
                writer_wanted.writerow(prev_row['row'])
                writer_wanted.writerow(row)
            prev_row = {'row':row, 'dt':dt}
        except:
            print "Failed to parse line", line_number
            print row       

该代码不返回任何错误,但始终会产生异常。 也就是说,它无法解析每个单行,并且输出文件不包含任何数据。 谁能看到我的代码中的错误导致其无法解析每一行?

我认为您正在将字符串传递给date()函数。 您需要使用int()将字段转换为整数。

同样,使用groupby()函数按日期对行进行分组可能更简单。

从表面上看,您的问题在于此行:

dt = datetime.date(year=row["Year Month Day Hours Minutes in YYYY"], month=row["MM"], day=row["DD"])

datetime.date使用整数,而不是字符串。 这样的事情将解决您的TypeError:

year = row["Year Month Day Hours Minutes in YYYY"]
month = row["MM"]
day = row['DD']
year = int(year)
month = int(month)
day = int(day)
dt = datetime.date(year=year,month=month,day=day)

真正的问题在于您的try / except语句。 因为这是一条笼统的语句(即,不涉及特定的错误类别),所以将不可能获得一条错误消息来让您调试代码。 如果遇到要跳过的解析错误,请使用:

try <errorname>:
    ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM