简体   繁体   中英

Converting date formats in pandas dataframe

I have a dataframe and the Date column has two different types of date formats going on.

eg. 1983-11-10 00:00:00 and 10/11/1983

I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?

I believe you need parameter dayfirst=True in to_datetime :

df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
                  Date
0  1983-11-10 00:00:00
1           10/11/1983


df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
        Date
0 1983-11-10
1 1983-11-10

because:

df['Date'] = pd.to_datetime(df.Date)
print (df)
        Date
0 1983-11-10
1 1983-10-11

Or you can specify both formats and then use combine_first :

d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')

df['Date'] = d1.combine_first(d2)
print (df)
        Date
0 1983-11-10
1 1983-11-10

General solution for multiple formats:

from functools import reduce 

def convert_formats_to_datetimes(col, formats):
    out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
    return reduce(lambda l,r: pd.Series.combine_first(l,r), out)

formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
        Date
0 1983-11-10
1 1983-11-10

I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?

Your input data is ambiguous: is 10 / 11 10th November or 11th October? You need to specify logic to determine which is appropriate. A function is useful if you with to try multiple date formats sequentially:

def date_apply_formats(s, form_lst):
    s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
    for form in form_lst[1:]:
        s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
    return s

df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])

Priority is given to the first item in form_lst . The solution is extendible to an arbitrary number of provided formats.

Input date is NSECODE Date Close 1 NSE500 20000103 1291.5500 2 NSE500 20000104 1335.4500 3 NSE500 20000105 1303.8000

history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")

ouput is now NSECode Date Close 1 NSE500 2000-01-03 1291.5500 2 NSE500 2000-01-04 1335.4500 3 NSE500 2000-01-05 1303.8000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM