简体   繁体   中英

Pandas dataframe to_datetime() is converting date incorrectly

I have a date in this format - '17-JUL-53'

when I pd.to_datetime('17-JUL-53') it returns Timestamp('2053-07-17 00:00:00')

You could say it is correct, but the actual date to be returned is 1953-07-17. That comes out OK in excel, how do we do that with to_datetime()?

[edit] Just to show what happens when we convert from str to time in python:

>>> time.strptime('17-JUL-53', '%d-%b-%y')
time.struct_time(tm_year=2053, tm_mon=7, tm_mday=17, tm_hour=0, tm_min=0,tm_sec=0, tm_wday=3, tm_yday=198, tm_isdst=-1)

I think you need add substring 19 to year.
More info about formatting of datetime is here .

import pandas as pd

s = '17-JUL-53'
d = s[:7] + '19' + s[7:]
print d
#17-JUL-1953
dt = pd.to_datetime(d, format='%d-%b-%Y')
print dt
#1953-07-17 00:00:00

%d-%b-%Y means:

%d - Day of the month as a zero-padded decimal number
%b - Month as locale's abbreviated name
%Y - Year with century as a decimal number

I would do it this way, providing all your dates are in the 1900 century :)

from dateutil.relativedelta import relativedelta
input ='17-jul-53'
output = pd.to_datetime(input)
output_clean = output - relativedelta(years=100)

Somehow you need to mention in which century you are ... in pandas this cannot be handled by to_datetime function, so you need to do it upstream. Here is an approach with regex:

import re
import pandas as pd

date = '17-JUL-53'

pd.to_datetime(re.sub(r'(\d{2}-\w{3}-)(\d{2})', r'\g<1>19\2', date))
#Timestamp('1953-07-17 00:00:00')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM