简体   繁体   English

如何防止pandas.to_datetime()函数将0001-01-01转换为2001-01-01

[英]How do I prevent pandas.to_datetime() function from converting 0001-01-01 to 2001-01-01

I have read-only access to a database that I query and read into a Pandas dataframe using pymssql. 我对数据库进行只读访问,我使用pymssql查询并读入Pandas数据帧。 One of the variables contains dates, some of which are stored as midnight on 01 Jan 0001 (ie 0001-01-01 00:00:00.0000000). 其中一个变量包含日期,其中一些日期存储在0001年1月1日午夜(即0001-01-01 00:00:00.0000000)。 I've no idea why those dates should be included – as far as I know, they are not recognised as a valid date by SQL Server and they are probably due to some default data entry. 我不知道为什么要包含这些日期 - 据我所知,它们不被SQL Server认可为有效日期,它们可能是由于某些默认数据输入。 Nevertheless, that's what I have to work with. 然而,这就是我必须要做的事情。 This can be recreated as a dataframe as follows: 这可以重新创建为数据帧,如下所示:

import numpy as np
import pandas as pd

tempDF = pd.DataFrame({ 'id': [0,1,2,3,4],
                        'date': ['0001-01-01 00:00:00.0000000',
                                 '2015-05-22 00:00:00.0000000',
                                 '0001-01-01 00:00:00.0000000',
                                 '2015-05-06 00:00:00.0000000',
                                 '2015-05-03 00:00:00.0000000']})

The dataframe looks like: 数据框如下所示:

print(tempDF)
                          date  id
0  0001-01-01 00:00:00.0000000   0
1  2015-05-22 00:00:00.0000000   1
2  0001-01-01 00:00:00.0000000   2
3  2015-05-06 00:00:00.0000000   3
4  2015-05-03 00:00:00.0000000   4

... with the following dtypes: ...使用以下dtypes:

print(tempDF.dtypes)

date    object
id       int64
dtype: object
print(tempDF.dtypes)

However, I routinely convert date fields in the dataframe to datetime format using: 但是,我经常使用以下方法将数据框中的日期字段转换为日期时间格式:

tempDF['date'] = pd.to_datetime(tempDF['date'])

However, by chance, I've noticed that the 0001-01-01 date is converted to 2001-01-01. 但是,我偶然发现0001-01-01的日期转换为2001-01-01。

print(tempDF)

        date  id
0 2001-01-01   0
1 2015-05-22   1
2 2001-01-01   2
3 2015-05-06   3
4 2015-05-03   4

I realise that the dates in the original database are incorrect because SQL Server doesn't see 0001-01-01 as a valid date. 我意识到原始数据库中的日期不正确,因为SQL Server没有将0001-01-01视为有效日期。 But at least in the 0001-01-01 format, such missing data are easy to identify within my Pandas dataframe. 但至少在0001-01-01格式中,这些丢失的数据很容易在我的Pandas数据帧中识别。 However, when pandas.to_datetime() changes these dates so they lie within a feasible range, it is very easy to miss such outliers. 但是,当pandas.to_datetime()更改这些日期以使它们位于可行范围内时,很容易错过这些异常值。

How can I make sure that pd.to_datetime doesn't interpret the outlier dates incorrectly? 如何确保pd.to_datetime不能错误地解释异常值日期?

If you provide a format , these dates will not be recognized: 如果您提供format ,则无法识别这些日期:

In [92]: pd.to_datetime(tempDF['date'], format="%Y-%m-%d %H:%M:%S.%f", errors='coerce')
Out[92]:
0          NaT
1   2015-05-22
2          NaT
3   2015-05-06
4   2015-05-03
Name: date, dtype: datetime64[ns]

By default it will error, but by passing errors='coerce' , they are converted to NaT values ( coerce=True for older pandas versions). 默认情况下它会出错,但是通过传递errors='coerce' ,它们会被转换为NaT值(对于旧的pandas版本, coerce=True )。

The reason pandas converts these "0001-01-01" dates to "2001-01-01" without providing a format , is because this is the behaviour of dateutil : pandas将这些“0001-01-01”日期转换为“2001-01-01”而不提供format的原因是因为这是dateutil的行为:

In [32]: import dateutil

In [33]: dateutil.parser.parse("0001-01-01")
Out[33]: datetime.datetime(2001, 1, 1, 0, 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将日期从0001-01-01转换为天数? - How to convert a date to a number of days from 0001-01-01? 将纪元(即 01/01/0001 午夜)转换为 pandas 中的 DateTime - Convert epoch, which is midnight 01/01/0001, to DateTime in pandas pandas.to_datetime -> ValueError:未转换的数据仍然存在:01 - pandas.to_datetime -> ValueError: unconverted data remains: 01 如何在 pandas 中将日期格式从 01-Mar-19 更改为 2019-03-01? - How do I change the date format from 01-Mar-19 to 2019-03-01 in pandas? 将自纪元以来的秒转换为自0001-01-01 UTC以来的天数 - Convert seconds since epoch to days since 0001-01-01 UTC in Python 如何从CSV中检测大熊猫中的01年1月1日日期 - How to detect 01-Jan-01 dates in pandas from csv 转换为日期时间,但如何在 python 中删除“1900-01-01” - Convert to datetime, but how to drop '1900-01-01' in python 如何将时间戳(datetime.datetime(2012,1,1,1,0)转换为2012-01-01 01:00:00 - How to convert timestamp (datetime.datetime(2012, 1, 1, 1, 0) into 2012-01-01 01:00:00 我有一个 Dataframe,其数据列从 2005 年 1 月 1 日到 2014 年 12 月 31 日。 我如何对列进行排序? - i have a Dataframe with a data column ranging from 2005-01-01 to 2014-12-31. How do i sort the columns? 从 DateTime 中删除时区 (+01:00) - Remove timezone (+01:00) from DateTime
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM