简体   繁体   English

为什么 Pandas to_datetime() function 在同一数据系列 (YYYY-DD-MM) 和 (YYYY-MM-DD) 中以两种不同格式返回 (DD-MM-YYYY) 日期时间?

[英]Why is Pandas to_datetime() function returning (DD-MM-YYYY) datetime in two different formats in the same data series (YYYY-DD-MM) and (YYYY-MM-DD)?

I am a beginner to quantitative analysis of stocks with time series analysis and intend to convert the data column in a Pandas series to a datetime format.我是使用时间序列分析对股票进行定量分析的初学者,并打算将 Pandas 系列中的数据列转换为日期时间格式。 Earlier, the index of the series was早些时候,该系列的索引是

infy.index

Output: Output:

Index(['03-01-2000', '04-01-2000', '05-01-2000', '06-01-2000', '07-01-2000',
       '10-01-2000', '11-01-2000', '12-01-2000', '13-01-2000', '14-01-2000',
       ...
       '16-04-2021', '19-04-2021', '20-04-2021', '22-04-2021', '23-04-2021',
       '26-04-2021', '27-04-2021', '28-04-2021', '29-04-2021', '30-04-2021'],
      dtype='object', name='Date', length=5306)

Currently, the format is DD-MM-YYYY.目前,格式为 DD-MM-YYYY。 So now, I applied the following code to change the format所以现在,我应用了以下代码来更改格式

infy = pd.to_datetime(infy.index)
infy.head()

Output: Output:

DatetimeIndex(['2000-03-01', '2000-04-01', '2000-05-01', '2000-06-01',
               '2000-07-01', '2000-10-01', '2000-11-01', '2000-12-01',
               '2000-01-13', '2000-01-14',
               ...
               '2021-04-16', '2021-04-19', '2021-04-20', '2021-04-22',
               '2021-04-23', '2021-04-26', '2021-04-27', '2021-04-28',
               '2021-04-29', '2021-04-30'],
              dtype='datetime64[ns]', name='Date', length=5306, freq=None)

So, the problem is that the first few dates are in the format YYYY-DD-MM, but if you look in the end, the format is now YYYY-MM-DD.所以,问题是前几个日期的格式是 YYYY-DD-MM,但如果你看最后,格式现在是 YYYY-MM-DD。 I thought I might've made some error in the beginning while converting the DataFrame to a Series but upon multiple iterations of rechecking, I couldn't find any problem.我以为我在将 DataFrame 转换为系列时可能在开始时犯了一些错误,但是经过多次重新检查,我找不到任何问题。

Why is this happening?为什么会这样?

The datetime default format in pandas is YYYY-MM-DD. pandas中的日期时间默认格式为 YYYY-MM-DD。 So it always shows datetime objects with that format.所以它总是显示具有该格式的datetime时间对象。 If you want to show the dates in another format you can use:如果您想以其他格式显示日期,您可以使用:

date_as_strings=pd.to_datetime(infy.index).dt.strftime("%y-%d-%m") 

Note: strftime changes the type to string.注意: strftime将类型更改为字符串。

pd.to_datetime , without any additional arguments, can flexibly parse a single column with multiple formats for a given date. pd.to_datetime无需任何额外的 arguments 即可灵活解析给定日期的具有多种格式的单列。 While this can be extremely powerful it is also very problematic.虽然这可能非常强大,但它也非常有问题。

The main issue here is that the default for the dayfirst argument of pd.to_datetime is False .这里的主要问题是 pd.to_datetime 的dayfirst参数的pd.to_datetime值为False

This means that for your first few dates, which it can properly parse because the first part is <= 12, pandas will parse this as MM-DD-YYYY ( dayfirst is False so the Month comes first).这意味着对于您的前几个日期,它可以正确解析,因为第一部分是 <= 12,pandas 会将其解析为 MM-DD-YYYY( dayfirstFalse ,因此月份首先出现)。 Later when it encounters a date where the first part is >12 it's smart enough to know there are only 12 months in a year so it assumes the format is DD-MM-YYYY and it parses those dates as such.稍后,当它遇到第一部分大于 12 的日期时,它很聪明地知道一年中只有 12 个月,因此它假定格式为 DD-MM-YYYY 并解析这些日期。

This is clearly not your desired behavior so you should either specify dayfirst=True or pass the specific format that works for all of your dates.这显然不是您想要的行为,因此您应该指定dayfirst=True或传递适用于所有日期的特定格式。

pd.to_datetime(infy.index, format='%d-%m-%Y')
# or
pd.to_datetime(infy.index, dayfirst=True)

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
               '2000-01-07', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14', 
               ...
               '2021-04-16', '2021-04-19',
               '2021-04-20', '2021-04-22', '2021-04-23', '2021-04-26',
               '2021-04-27', '2021-04-28', '2021-04-29', '2021-04-30'],
              dtype='datetime64[ns]', freq=None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM