[英]Pandas, Handling “Out of bounds timestamp…”
I have a df with certain features as object types which I want to convert to datetypes. 我有一个具有某些功能的df作为对象类型,我想将其转换为日期类型。 When I attempt to convert using pd.to_datetime, some of these features return an "Out of bounds timestamp" error message. 当我尝试使用pd.to_datetime进行转换时,这些功能中的某些功能会返回“超出范围的时间戳”错误消息。 To address this, I add "errors= coerce" argument, then seek to drop all NAs which result. 为了解决这个问题,我添加了“ errors = coerce”参数,然后尝试删除所有导致的NA。 For example: 例如:
pd.to_datetime(df[date_features], infer_datetime_format = True, errors = 'coerce')
df[date_features].dropna(inplace= True)
Yet, this doesn't seem to convert the features to 'datetime:' ("maturity_date" is one of the date_features I am trying to convert to datetime). 但是,这似乎并未将功能转换为“ datetime:”(“ maturity_date”是我尝试转换为datetime的date_features之一)。
df.[maturity_date].describe()
count 3355323
unique 11954
top 2015-12-01
freq 29607
Name: maturity_date, dtype: object
Furthermore, if I again try to convert maturity_date using pd.to_datetime without "coerce" I get the "Out of bounds" timestamp. 此外,如果我再次尝试使用pd.to_datetime转换maturity_date而不使用“ coerce”,则会获得“ Out of bounds”时间戳。
I hope I have described this problem thoroughly. 我希望我已经彻底描述了这个问题。
Any thoughts? 有什么想法吗?
pd.to_datetime
is not an inplace operation. pd.to_datetime
不是就地操作。 Your code performs a conversion, and proceeds to discard the result. 您的代码执行转换,然后继续丢弃结果。 The right thing to do would be to assign the result back, like so - 正确的做法是将结果分配回去,就像这样-
df['date_features'] = pd.to_datetime(df.date_features, errors='coerce')
Furthermore, don't call dropna
on a column that belongs to a dataframe, as this will not modify the dataframe (even with inplace=True
). 此外,请勿在属于数据dropna
的列上调用dropna
,因为这不会修改数据框(即使使用inplace=True
)。 Instead, call dropna
on the dataframe with a subset
attribute - 相反,请使用subset
属性在数据 dropna
上调用dropna
df.dropna(subset='date_features', inplace=True)
Now, as observed, maturity_date
will look like this - 现在,正如观察到的, maturity_date
看起来像这样-
results["maturity_date"].head()
0 2017-04-01
1 2017-04-01
2 2017-04-01
3 2016-01-15
4 2016-01-15
Name: maturity_date, dtype: datetime64[ns]
As you can see, the dtype
is datetime64
, meaning this operation worked. 如您所见, dtype
是datetime64
,这意味着此操作有效。 If you call describe()
, it performs a few standard aggregations and returns the results as a new series . 如果调用describe()
,它将执行一些标准聚合并将结果作为新系列返回。 This series is displayed in the same way as any other, including a dtype
description that applies to it , not the column it is describing. 这一系列显示在相同的方式与任何其他,包括一个dtype
适用于它的描述,而不是它被描述的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.