简体   繁体   中英

Python Pandas out of bounds datetime timestamp error for long dataframe

I have a data frame df with shape (500000,70) and several columns including invalid dates like 4000-01-01 00:00:00 . In a smaller version of this data frame I tried

df["date"] = df["date"].astype(str)
df["date"] = df["date"].replace('4000-01-01 00:00:00', pd.NaT)

which worked fine. Also the version

df["date"] = pd.to_datetime(df["date"].replace("4000-01-01 00:00:00",pd.NaT))

worked. For the long data frame version I receive the following error

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4000-01-01 00:00:00

Any suggestions how to solve this problem in an elegant way or what the problem might be?

Thank you.

如果向to_datetime函数添加参数errors='coerce'to_datetime为所有不可解析的日期时间返回NaT

df["date"] = pd.to_datetime(df["date"], errors='coerce')

The error is because:

In [332]: pd.Timestamp.max
Out[332]: Timestamp('2262-04-11 23:47:16.854775807')

The upper limit of the date is this. And your value is out of the range, hence OutOfBoundsError.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM