I have a data frame df
with shape (500000,70)
and several columns including invalid dates like 4000-01-01 00:00:00
. In a smaller version of this data frame I tried
df["date"] = df["date"].astype(str)
df["date"] = df["date"].replace('4000-01-01 00:00:00', pd.NaT)
which worked fine. Also the version
df["date"] = pd.to_datetime(df["date"].replace("4000-01-01 00:00:00",pd.NaT))
worked. For the long data frame version I receive the following error
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 4000-01-01 00:00:00
Any suggestions how to solve this problem in an elegant way or what the problem might be?
Thank you.
如果向to_datetime
函数添加参数errors='coerce'
, to_datetime
为所有不可解析的日期时间返回NaT
:
df["date"] = pd.to_datetime(df["date"], errors='coerce')
The error is because:
In [332]: pd.Timestamp.max
Out[332]: Timestamp('2262-04-11 23:47:16.854775807')
The upper limit of the date is this. And your value is out of the range, hence OutOfBoundsError.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.