熊猫，处理“超出时间戳范围……”

Question

I have a df with certain features as object types which I want to convert to datetypes. 我有一个具有某些功能的df作为对象类型，我想将其转换为日期类型。 When I attempt to convert using pd.to_datetime, some of these features return an "Out of bounds timestamp" error message. 当我尝试使用pd.to_datetime进行转换时，这些功能中的某些功能会返回“超出范围的时间戳”错误消息。 To address this, I add "errors= coerce" argument, then seek to drop all NAs which result. 为了解决这个问题，我添加了“ errors = coerce”参数，然后尝试删除所有导致的NA。 For example: 例如：

pd.to_datetime(df[date_features], infer_datetime_format = True, errors = 'coerce')
df[date_features].dropna(inplace= True)

Yet, this doesn't seem to convert the features to 'datetime:' ("maturity_date" is one of the date_features I am trying to convert to datetime). 但是，这似乎并未将功能转换为“ datetime：”（“ maturity_date”是我尝试转换为datetime的date_features之一）。

df.[maturity_date].describe()

count        3355323
unique         11954
top       2015-12-01
freq           29607
Name: maturity_date, dtype: object

Furthermore, if I again try to convert maturity_date using pd.to_datetime without "coerce" I get the "Out of bounds" timestamp. 此外，如果我再次尝试使用pd.to_datetime转换maturity_date而不使用“ coerce”，则会获得“ Out of bounds”时间戳。

I hope I have described this problem thoroughly. 我希望我已经彻底描述了这个问题。

Any thoughts? 有什么想法吗？

Answer 1

pd.to_datetime is not an inplace operation. pd.to_datetime不是就地操作。 Your code performs a conversion, and proceeds to discard the result. 您的代码执行转换，然后继续丢弃结果。 The right thing to do would be to assign the result back, like so - 正确的做法是将结果分配回去，就像这样-

df['date_features'] = pd.to_datetime(df.date_features, errors='coerce')

Furthermore, don't call dropna on a column that belongs to a dataframe, as this will not modify the dataframe (even with inplace=True ). 此外，请勿在属于数据dropna的列上调用dropna ，因为这不会修改数据框（即使使用inplace=True ）。 Instead, call dropna on the dataframe with a subset attribute - 相反，请使用subset属性在数据 dropna上调用dropna

df.dropna(subset='date_features', inplace=True)

Now, as observed, maturity_date will look like this - 现在，正如观察到的， maturity_date看起来像这样-

results["maturity_date"].head()

0   2017-04-01
1   2017-04-01
2   2017-04-01
3   2016-01-15
4   2016-01-15
Name: maturity_date, dtype: datetime64[ns]

As you can see, the dtype is datetime64 , meaning this operation worked. 如您所见， dtype是datetime64 ，这意味着此操作有效。 If you call describe() , it performs a few standard aggregations and returns the results as a new series . 如果调用describe() ，它将执行一些标准聚合并将结果作为新系列返回。 This series is displayed in the same way as any other, including a dtype description that applies to it , not the column it is describing. 这一系列显示在相同的方式与任何其他，包括一个dtype适用于它的描述，而不是它被描述的列。

熊猫，处理“超出时间戳范围……”

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-12-18 17:36:21

熊猫，处理“超出时间戳范围……”

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-12-18 17:36:21

解决方案1
3 已采纳 2017-12-18 17:36:21