简体   繁体   English

熊猫,处理“超出时间戳范围……”

[英]Pandas, Handling “Out of bounds timestamp…”

I have a df with certain features as object types which I want to convert to datetypes. 我有一个具有某些功能的df作为对象类型,我想将其转换为日期类型。 When I attempt to convert using pd.to_datetime, some of these features return an "Out of bounds timestamp" error message. 当我尝试使用pd.to_datetime进行转换时,这些功能中的某些功能会返回“超出范围的时间戳”错误消息。 To address this, I add "errors= coerce" argument, then seek to drop all NAs which result. 为了解决这个问题,我添加了“ errors = coerce”参数,然后尝试删除所有导致的NA。 For example: 例如:

pd.to_datetime(df[date_features], infer_datetime_format = True, errors = 'coerce')
df[date_features].dropna(inplace= True)

Yet, this doesn't seem to convert the features to 'datetime:' ("maturity_date" is one of the date_features I am trying to convert to datetime). 但是,这似乎并未将功能转换为“ datetime:”(“ maturity_date”是我尝试转换为datetime的date_features之一)。

df.[maturity_date].describe()

count        3355323
unique         11954
top       2015-12-01
freq           29607
Name: maturity_date, dtype: object

Furthermore, if I again try to convert maturity_date using pd.to_datetime without "coerce" I get the "Out of bounds" timestamp. 此外,如果我再次尝试使用pd.to_datetime转换maturity_date而不使用“ coerce”,则会获得“ Out of bounds”时间戳。

I hope I have described this problem thoroughly. 我希望我已经彻底描述了这个问题。

Any thoughts? 有什么想法吗?

pd.to_datetime is not an inplace operation. pd.to_datetime不是就地操作。 Your code performs a conversion, and proceeds to discard the result. 您的代码执行转换,然后继续丢弃结果。 The right thing to do would be to assign the result back, like so - 正确的做法是将结果分配回去,就像这样-

df['date_features'] = pd.to_datetime(df.date_features, errors='coerce')

Furthermore, don't call dropna on a column that belongs to a dataframe, as this will not modify the dataframe (even with inplace=True ). 此外,请勿在属于数据dropna的列上调用dropna ,因为这不会修改数据框(即使使用inplace=True )。 Instead, call dropna on the dataframe with a subset attribute - 相反,请使用subset属性在数据 dropna上调用dropna

df.dropna(subset='date_features', inplace=True)

Now, as observed, maturity_date will look like this - 现在,正如观察到的, maturity_date看起来像这样-

results["maturity_date"].head()

0   2017-04-01
1   2017-04-01
2   2017-04-01
3   2016-01-15
4   2016-01-15
Name: maturity_date, dtype: datetime64[ns]

As you can see, the dtype is datetime64 , meaning this operation worked. 如您所见, dtypedatetime64 ,这意味着此操作有效。 If you call describe() , it performs a few standard aggregations and returns the results as a new series . 如果调用describe() ,它将执行一些标准聚合并将结果作为新系列返回。 This series is displayed in the same way as any other, including a dtype description that applies to it , not the column it is describing. 这一系列显示在相同的方式与任何其他,包括一个dtype适用于的描述,而不是它被描述的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫越界时间戳解决方法 - Pandas Out of Bounds timestamp work around 长数据框的Python Pandas超出日期时间时间戳错误 - Python Pandas out of bounds datetime timestamp error for long dataframe 超出范围的纳秒级时间戳 - Out of bounds nanosecond timestamp OutOfBoundsDatetime:越界纳秒时间戳 - OutOfBoundsDatetime: Out of bounds nanosecond timestamp Python Pandas to_datetime pandas.datetime的出纳秒时间戳 - Python Pandas to_datetime Out of bounds nanosecond timestamp on a pandas.datetime 如何解决Python Pandas DataFrame的“ Outbounds nanosecond timestamp”错误? - How to work around Python Pandas DataFrame's “Out of bounds nanosecond timestamp” error? 偏移量前滚后加上一个月偏移量后的熊猫超出纳秒时间戳 - pandas out of bounds nanosecond timestamp after offset rollforward plus adding a month offset 从 CSV 导入时转换日期,OutOfBoundsDatetime:超出范围纳秒时间戳。 Pandas - Converting dates when importing from CSV, OutOfBoundsDatetime: Out of bounds nanosecond timestamp. Pandas 如何在方法'pandas.read_sql()'中处理时间戳的'越界'错误 - How to deal with 'out-of-bounds' error for timestamp in the method 'pandas.read_sql()' 将 Pandas df 写入 Pyarrow Parquet 表会导致“越界”时间戳问题 - Writing Pandas df to Pyarrow Parquet table results in 'out of bounds' timestamp issue
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM