如何标准化csv文件中的日期？蟒蛇

Question

I have a CSV file with a field named start_date that contains data in a variety of formats. 我有一个CSV文件，其名称为start_date的字段包含各种格式的数据。

Some of the formats include eg, June 23, 1912 or 5/11/1930 (month, day, year). 一些格式包括例如June 23, 1912或5/11/1930 June 23, 1912 （月，日，年）。 But not all values are valid dates. 但并非所有值都是有效日期。

I want to add a start_date_description field adjacent to the start_date column to filter invalid date values into. 我想在start_date列旁边添加一个start_date_description字段，以将无效的日期值过滤到其中。 Lastly, normalize all valid date values in start_date to ISO 8601 (ie, YYYY-MM-DD ). 最后，将start_date中的所有有效日期值标准化为ISO 8601（即YYYY-MM-DD ）。

So far I was only able to load the start_date into my file, I am stuck and would appreciate ant help. 到目前为止，我只能够将start_date加载到我的文件中，但我陷入了困境，不胜感激。 Please, any solution especially without using a library would be great! 请，任何解决方案，尤其是不使用库的解决方案都很棒！

import csv

date_column = ("start_date")
f = open("test.csv","r")
csv_reader = csv.reader(f)

headers = None
results = []
for row in csv_reader:
    if not headers:
        headers = []
        for i, col in enumerate(row):
           if col in date_column:
            headers.append(i)
    else:
        results.append(([row[i] for i in headers]))

print results

Answer 1

One way is to use dateutil module, you can parse data as follows: 一种方法是使用dateutil模块，可以按如下方式解析数据：

from dateutil import parser
parser.parse('3/16/78')
parser.parse('4-Apr') # this will give current year i.e. 2017

Then parsing to your format can be done by 然后可以通过以下方式解析为您的格式

dt = parser.parse('3/16/78')
dt.strftime('%Y-%m-%d')

Suppose you have table in dataframe format, you can now define parsing function and apply to column as follows: 假设您具有数据帧格式的表，现在可以定义解析函数并将其应用于列，如下所示：

def parse_date(start_time):
    try:
        return parser.parse(x).strftime('%Y-%m-%d')
    except:
        return ''
df['parse_date'] = df.start_date.map(lambda x: parse_date(x))

Answer 2

Question : ... add a start_date_description ... normalize ... to ISO 8601 问题：...添加起始日期说明...标准化...到ISO 8601

This reads the File test.csv and validates the Date String in Column start_date with Date Directive Patterns and returns a dict{description, ISO} . 这将读取文件test.csv并使用日期指令模式验证start_date列中的日期字符串，并返回dict{description, ISO} 。 The returned dict is used to update the current Row dict and the updated Row dict is writen to the File test_update.csv . 返回的dict用于更新当前Row dict ，并将更新的Row dict写入文件test_update.csv 。

Put this in a NEW Python File and run it! 将其放在一个新的Python文件中并运行它！

A missing valid Date Directive Pattern could be simple added to the Array. 缺少的有效日期指令模式可以简单地添加到数组中。

Python » 3.6 Documentation: 8.1.8. Python»3.6文档： 8.1.8。 strftime() and strptime() Behavior strftime（）和strptime（）行为

from datetime import datetime as dt
import re

def validate(date):
    def _dict(desc, date):
        return {'start_date_description':desc, 'ISO':date}

    for format in [('%m/%d/%y','Valid'), ('%b-%y','Short, missing Day'), ('%d-%b-%y','Valid'),
                   ('%d-%b','Short, missing Year')]: #, ('%B %d. %Y','Valid')]:
        try:
            _dt = dt.strptime(date, format[0])
            return _dict(format[1], _dt.strftime('%Y-%m-%d'))
        except:
            continue

    if not re.search(r'\d+', date):
        return _dict('No Digit', None)

    return _dict('Unknown Pattern', None)

with open('test.csv') as fh_in, open('test_update.csv', 'w') as fh_out:
    csv_reader = csv.DictReader(fh_in)
    csv_writer = csv.DictWriter(fh_out,
                                fieldnames=csv_reader.fieldnames +
                                           ['start_date_description', 'ISO'] )
    csv_writer.writeheader()

    for row, values in enumerate(csv_reader,2):
        values.update(validate(values['start_date']))

        # Show only Invalid Dates
        if any(w in values['start_date_description'] 
               for w in ['Unknown', 'No Digit', 'missing']):

            print('{:>3}: {v[start_date]:13.13} {v[start_date_description]:<22} {v[ISO]}'.
                  format(row, v=values))

        csv_writer.writerow(values)

Output : 输出：

 start_date start_date_description ISO June 23. 1912 Valid 1912-06-23 12/31/91 Valid 1991-12-31 Oct-84 Short, missing Day 1984-10-01 Feb-09 Short, missing Day 2009-02-01 10-Dec-80 Valid 1980-12-10 10/7/81 Valid 1981-10-07 Facere volupt No Digit None ... (omitted for brevity)

Tested with Python: 3.4.2 使用Python测试：3.4.2

如何标准化csv文件中的日期？蟒蛇

问题描述

2 个解决方案

解决方案1
3 2017-07-08 08:24:55

解决方案2
1 2017-07-08 20:42:59

如何标准化csv文件中的日期？ 蟒蛇

问题描述

2 个解决方案

解决方案1 3 2017-07-08 08:24:55

解决方案2 1 2017-07-08 20:42:59

如何标准化csv文件中的日期？蟒蛇

解决方案1
3 2017-07-08 08:24:55

解决方案2
1 2017-07-08 20:42:59