[英]Pandas dataframe to_datetime() is converting date incorrectly
I have a date in this format - '17-JUL-53' 我有这种格式的日期-'17 -JUL-53'
when I pd.to_datetime('17-JUL-53')
it returns Timestamp('2053-07-17 00:00:00')
当我
pd.to_datetime('17-JUL-53')
返回Timestamp('2053-07-17 00:00:00')
You could say it is correct, but the actual date to be returned is 1953-07-17. 您可以说是正确的,但是实际返回的日期是1953-07-17。 That comes out OK in excel, how do we do that with to_datetime()?
在Excel中就可以了,我们如何使用to_datetime()做到这一点?
[edit] Just to show what happens when we convert from str to time in python: [edit]只是为了说明在python中将str从时间转换为时间时会发生什么:
>>> time.strptime('17-JUL-53', '%d-%b-%y')
time.struct_time(tm_year=2053, tm_mon=7, tm_mday=17, tm_hour=0, tm_min=0,tm_sec=0, tm_wday=3, tm_yday=198, tm_isdst=-1)
I think you need add substring 19
to year. 我认为您需要在子字符串中添加
19
。
More info about formatting of datetime is here . 有关datetime格式的更多信息在这里 。
import pandas as pd
s = '17-JUL-53'
d = s[:7] + '19' + s[7:]
print d
#17-JUL-1953
dt = pd.to_datetime(d, format='%d-%b-%Y')
print dt
#1953-07-17 00:00:00
%d-%b-%Y
means: %d-%b-%Y
表示:
%d - Day of the month as a zero-padded decimal number
%d-月份中的天,为零填充的十进制数字
%b - Month as locale's abbreviated name%b-月份作为语言环境的缩写名称
%Y - Year with century as a decimal number%Y-以世纪作为十进制数字的年份
I would do it this way, providing all your dates are in the 1900 century :) 如果您的所有约会都在1900世纪,我会这样做:)
from dateutil.relativedelta import relativedelta
input ='17-jul-53'
output = pd.to_datetime(input)
output_clean = output - relativedelta(years=100)
Somehow you need to mention in which century you are ... in pandas
this cannot be handled by to_datetime
function, so you need to do it upstream. 您需要以某种方式提及您处于哪个世纪……在
pandas
这无法通过to_datetime
函数来处理,因此您需要在上游进行。 Here is an approach with regex: 这是使用正则表达式的方法:
import re
import pandas as pd
date = '17-JUL-53'
pd.to_datetime(re.sub(r'(\d{2}-\w{3}-)(\d{2})', r'\g<1>19\2', date))
#Timestamp('1953-07-17 00:00:00')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.