[英]How would I normalize dates in a csv file? python
I have a CSV file with a field named start_date
that contains data in a variety of formats. 我有一个CSV文件,其名称为
start_date
的字段包含各种格式的数据。
Some of the formats include eg, June 23, 1912
or 5/11/1930
(month, day, year). 一些格式包括例如
June 23, 1912
或5/11/1930
June 23, 1912
(月,日,年)。 But not all values are valid dates. 但并非所有值都是有效日期。
I want to add a start_date_description
field adjacent to the start_date
column to filter invalid date values into. 我想在
start_date
列旁边添加一个start_date_description
字段,以将无效的日期值过滤到其中。 Lastly, normalize all valid date values in start_date to ISO 8601 (ie, YYYY-MM-DD
). 最后,将start_date中的所有有效日期值标准化为ISO 8601(即
YYYY-MM-DD
)。
So far I was only able to load the start_date into my file, I am stuck and would appreciate ant help. 到目前为止,我只能够将start_date加载到我的文件中,但我陷入了困境,不胜感激。 Please, any solution especially without using a library would be great!
请,任何解决方案,尤其是不使用库的解决方案都很棒!
import csv
date_column = ("start_date")
f = open("test.csv","r")
csv_reader = csv.reader(f)
headers = None
results = []
for row in csv_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in date_column:
headers.append(i)
else:
results.append(([row[i] for i in headers]))
print results
One way is to use dateutil
module, you can parse data as follows: 一种方法是使用
dateutil
模块,可以按如下方式解析数据:
from dateutil import parser
parser.parse('3/16/78')
parser.parse('4-Apr') # this will give current year i.e. 2017
Then parsing to your format can be done by 然后可以通过以下方式解析为您的格式
dt = parser.parse('3/16/78')
dt.strftime('%Y-%m-%d')
Suppose you have table in dataframe format, you can now define parsing function and apply to column as follows: 假设您具有数据帧格式的表,现在可以定义解析函数并将其应用于列,如下所示:
def parse_date(start_time):
try:
return parser.parse(x).strftime('%Y-%m-%d')
except:
return ''
df['parse_date'] = df.start_date.map(lambda x: parse_date(x))
Question : ... add a start_date_description ... normalize ... to ISO 8601
问题 :...添加起始日期说明...标准化...到ISO 8601
This reads the File test.csv
and validates the Date String in Column start_date
with Date Directive Patterns and returns a dict{description, ISO}
. 这将读取文件
test.csv
并使用日期指令模式验证start_date
列中的日期字符串,并返回dict{description, ISO}
。 The returned dict
is used to update the current Row dict
and the updated Row dict
is writen to the File test_update.csv
. 返回的
dict
用于更新当前Row dict
,并将更新的Row dict
写入文件test_update.csv
。
Put this in a NEW Python File and run it! 将其放在一个新的Python文件中并运行它!
A missing valid Date Directive Pattern could be simple added to the Array. 缺少的有效日期指令模式可以简单地添加到数组中。
Python » 3.6 Documentation: 8.1.8.
Python»3.6文档: 8.1.8。 strftime() and strptime() Behavior
strftime()和strptime()行为
from datetime import datetime as dt
import re
def validate(date):
def _dict(desc, date):
return {'start_date_description':desc, 'ISO':date}
for format in [('%m/%d/%y','Valid'), ('%b-%y','Short, missing Day'), ('%d-%b-%y','Valid'),
('%d-%b','Short, missing Year')]: #, ('%B %d. %Y','Valid')]:
try:
_dt = dt.strptime(date, format[0])
return _dict(format[1], _dt.strftime('%Y-%m-%d'))
except:
continue
if not re.search(r'\d+', date):
return _dict('No Digit', None)
return _dict('Unknown Pattern', None)
with open('test.csv') as fh_in, open('test_update.csv', 'w') as fh_out:
csv_reader = csv.DictReader(fh_in)
csv_writer = csv.DictWriter(fh_out,
fieldnames=csv_reader.fieldnames +
['start_date_description', 'ISO'] )
csv_writer.writeheader()
for row, values in enumerate(csv_reader,2):
values.update(validate(values['start_date']))
# Show only Invalid Dates
if any(w in values['start_date_description']
for w in ['Unknown', 'No Digit', 'missing']):
print('{:>3}: {v[start_date]:13.13} {v[start_date_description]:<22} {v[ISO]}'.
format(row, v=values))
csv_writer.writerow(values)
Output :
输出 :
start_date start_date_description ISO June 23. 1912 Valid 1912-06-23 12/31/91 Valid 1991-12-31 Oct-84 Short, missing Day 1984-10-01 Feb-09 Short, missing Day 2009-02-01 10-Dec-80 Valid 1980-12-10 10/7/81 Valid 1981-10-07 Facere volupt No Digit None ... (omitted for brevity)
Tested with Python: 3.4.2 使用Python测试:3.4.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.