[英]Converting date formats in pandas dataframe
I have a dataframe and the Date column has two different types of date formats going on.我有一个数据框,日期列有两种不同类型的日期格式。
eg. 1983-11-10 00:00:00 and 10/11/1983
I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?我希望它们都是相同的类型,如何遍历数据框的日期列并将日期转换为一种格式?
I believe you need parameter dayfirst=True
in to_datetime
:我相信你需要参数dayfirst=True
在to_datetime
:
df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
Date
0 1983-11-10 00:00:00
1 10/11/1983
df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
Date
0 1983-11-10
1 1983-11-10
because:因为:
df['Date'] = pd.to_datetime(df.Date)
print (df)
Date
0 1983-11-10
1 1983-10-11
Or you can specify both formats and then use combine_first
:或者您可以指定两种格式,然后使用combine_first
:
d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')
df['Date'] = d1.combine_first(d2)
print (df)
Date
0 1983-11-10
1 1983-11-10
General solution for multiple formats:多种格式的通用解决方案:
from functools import reduce
def convert_formats_to_datetimes(col, formats):
out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
return reduce(lambda l,r: pd.Series.combine_first(l,r), out)
formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
Date
0 1983-11-10
1 1983-11-10
I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?我希望它们都是相同的类型,如何遍历数据框的日期列并将日期转换为一种格式?
Your input data is ambiguous: is 10 / 11
10th November or 11th October?您输入的数据不明确: 10 / 11
是10 / 11
10 日还是 10 月 11 日? You need to specify logic to determine which is appropriate.您需要指定逻辑来确定哪个是合适的。 A function is useful if you with to try multiple date formats sequentially:如果您要按顺序尝试多种日期格式,则函数很有用:
def date_apply_formats(s, form_lst):
s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
for form in form_lst[1:]:
s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
return s
df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])
Priority is given to the first item in form_lst
.优先考虑form_lst
的第一项。 The solution is extendible to an arbitrary number of provided formats.该解决方案可扩展到任意数量的提供格式。
Input date is NSECODE Date Close 1 NSE500 20000103 1291.5500 2 NSE500 20000104 1335.4500 3 NSE500 20000105 1303.8000输入日期为 NSECODE 日期关闭 1 NSE500 20000103 1291.5500 2 NSE500 20000104 1335.4500 3 NSE500 20000105 1303.8000
history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")
ouput is now NSECode Date Close 1 NSE500 2000-01-03 1291.5500 2 NSE500 2000-01-04 1335.4500 3 NSE500 2000-01-05 1303.8000输出现在是 NSECode 日期 关闭 1 NSE500 2000-01-03 1291.5500 2 NSE500 2000-01-04 1335.4500 3 NSE500 2000-01-05 1303.8000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.