简体   繁体   English

长数据框的Python Pandas超出日期时间时间戳错误

[英]Python Pandas out of bounds datetime timestamp error for long dataframe

I have a data frame df with shape (500000,70) and several columns including invalid dates like 4000-01-01 00:00:00 . 我有一个形状为(500000,70)的数据框df ,其中包括无效日期之类的几列,如4000-01-01 00:00:00 In a smaller version of this data frame I tried 在此数据框的较小版本中,我尝试了

df["date"] = df["date"].astype(str)
df["date"] = df["date"].replace('4000-01-01 00:00:00', pd.NaT)

which worked fine. 效果很好。 Also the version 还有版本

df["date"] = pd.to_datetime(df["date"].replace("4000-01-01 00:00:00",pd.NaT))

worked. 工作。 For the long data frame version I receive the following error 对于长数据框版本,我收到以下错误

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4000-01-01 00:00:00

Any suggestions how to solve this problem in an elegant way or what the problem might be? 有什么建议如何以一种优雅的方式解决这个问题,或者可能是什么问题?

Thank you. 谢谢。

如果向to_datetime函数添加参数errors='coerce'to_datetime为所有不可解析的日期时间返回NaT

df["date"] = pd.to_datetime(df["date"], errors='coerce')

The error is because: 该错误是因为:

In [332]: pd.Timestamp.max
Out[332]: Timestamp('2262-04-11 23:47:16.854775807')

The upper limit of the date is this. 日期的上限是这个。 And your value is out of the range, hence OutOfBoundsError. 而且您的值超出了范围,因此超出了OutOfBoundsError。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM