简体   繁体   English

在熊猫数据框中转换日期格式

[英]Converting date formats in pandas dataframe

I have a dataframe and the Date column has two different types of date formats going on.我有一个数据框,日期列有两种不同类型的日期格式。

eg. 1983-11-10 00:00:00 and 10/11/1983

I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?我希望它们都是相同的类型,如何遍历数据框的日期列并将日期转换为一种格式?

I believe you need parameter dayfirst=True in to_datetime :我相信你需要参数dayfirst=Trueto_datetime

df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
                  Date
0  1983-11-10 00:00:00
1           10/11/1983


df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
        Date
0 1983-11-10
1 1983-11-10

because:因为:

df['Date'] = pd.to_datetime(df.Date)
print (df)
        Date
0 1983-11-10
1 1983-10-11

Or you can specify both formats and then use combine_first :或者您可以指定两种格式,然后使用combine_first

d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')

df['Date'] = d1.combine_first(d2)
print (df)
        Date
0 1983-11-10
1 1983-11-10

General solution for multiple formats:多种格式的通用解决方案:

from functools import reduce 

def convert_formats_to_datetimes(col, formats):
    out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
    return reduce(lambda l,r: pd.Series.combine_first(l,r), out)

formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
        Date
0 1983-11-10
1 1983-11-10

I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?我希望它们都是相同的类型,如何遍历数据框的日期列并将日期转换为一种格式?

Your input data is ambiguous: is 10 / 11 10th November or 11th October?您输入的数据不明确: 10 / 1110 / 11 10 日还是 10 月 11 日? You need to specify logic to determine which is appropriate.您需要指定逻辑来确定哪个是合适的。 A function is useful if you with to try multiple date formats sequentially:如果您要按顺序尝试多种日期格式,则函数很有用:

def date_apply_formats(s, form_lst):
    s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
    for form in form_lst[1:]:
        s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
    return s

df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])

Priority is given to the first item in form_lst .优先考虑form_lst的第一项。 The solution is extendible to an arbitrary number of provided formats.该解决方案可扩展到任意数量的提供格式。

Input date is NSECODE Date Close 1 NSE500 20000103 1291.5500 2 NSE500 20000104 1335.4500 3 NSE500 20000105 1303.8000输入日期为 NSECODE 日期关闭 1 NSE500 20000103 1291.5500 2 NSE500 20000104 1335.4500 3 NSE500 20000105 1303.8000

history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")

ouput is now NSECode Date Close 1 NSE500 2000-01-03 1291.5500 2 NSE500 2000-01-04 1335.4500 3 NSE500 2000-01-05 1303.8000输出现在是 NSECode 日期 关闭 1 NSE500 2000-01-03 1291.5500 2 NSE500 2000-01-04 1335.4500 3 NSE500 2000-01-05 1303.8000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM