简体   繁体   English

如何在更新 Pandas 后将 object 数据类型转换为 Float64 和 Int64 以用于 NAN 值的插值

[英]How to convert object data type to Float64 and Int64 after updating the Pandas for interpolation for NAN values

I have updated my Anaconda environment and hence the associated libraries such as Pandas have been updated.我已经更新了我的 Anaconda 环境,因此相关的库,例如 Pandas 已经更新。 I had a working code that now gives me the following error我有一个工作代码,现在给我以下错误

ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

When I analysis the datatypes ( df.dtypes ) I get all the data as object where as in the previous Pandas it used to show int64 and Float65 .当我分析数据类型( df.dtypes )时,我得到的所有数据都是object ,而在之前的 Pandas 中,它曾经显示int64Float65

I have looked at several thread on stackoverflow with simillar issues and found that the issue might be with data column, however following other threads did not solve the issue.我在 stackoverflow 上查看了几个具有类似问题的线程,发现问题可能出在数据列上,但是关注其他线程并没有解决问题。

My code looks like this:我的代码如下所示:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
def fault_mapper_FD(faultDate):
    if pd.Timestamp(2017, 8, 27, 0) <= faultDate <= pd.Timestamp(2017, 8, 28, 0):
        return 0
    if pd.Timestamp(2017, 8, 29, 0) <= faultDate <= pd.Timestamp(2017, 8, 29, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 1, 0) <= faultDate <= pd.Timestamp(2017, 12, 1, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 3, 0) <= faultDate <= pd.Timestamp(2017, 12, 3, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 7, 0) <= faultDate <= pd.Timestamp(2017, 12, 8, 0):
        return 0
    if pd.Timestamp(2017, 12, 14, 0) <= faultDate <= pd.Timestamp(2017, 12, 14, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 7, 0) <= faultDate <= pd.Timestamp(2018, 2, 7, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 9, 0) <= faultDate <= pd.Timestamp(2018, 2, 9, 23, 59):
        return 0
    if pd.Timestamp(2017, 12, 20, 0) <= faultDate <= pd.Timestamp(2017, 12, 20, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 18, 0) <= faultDate <= pd.Timestamp(2018, 2, 18, 23, 59):
        return 0
    if pd.Timestamp(2018, 2, 1, 0) <= faultDate <= pd.Timestamp(2018, 2, 1, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 31, 0) <= faultDate <= pd.Timestamp(2018, 1, 31, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 28, 0) <= faultDate <= pd.Timestamp(2018, 1, 28, 23, 59):
        return 0
    if pd.Timestamp(2018, 1, 27, 0) <= faultDate <= pd.Timestamp(2018, 1, 27, 23, 59):
        return 0
    if (pd.Timestamp(2017, 9, 1, 0) <= faultDate <= pd.Timestamp(2017, 9, 1, 23, 59) or 
    pd.Timestamp(2017, 11, 30, 0) <= faultDate <= pd.Timestamp(2017, 11, 30, 23, 59) or 
    pd.Timestamp(2017, 12, 9, 0) <= faultDate <= pd.Timestamp(2017, 12, 9, 23, 59) or 
    pd.Timestamp(2017, 12, 10, 0) <= faultDate <= pd.Timestamp(2017, 12, 11, 0) or 
    pd.Timestamp(2017, 12, 24, 0) <= faultDate <= pd.Timestamp(2017, 12, 24, 23, 59) or 
    pd.Timestamp(2018, 2, 4, 0) <= faultDate <= pd.Timestamp(2018, 2, 4, 23, 59) or 
    pd.Timestamp(2018, 2, 5, 0) <= faultDate <= pd.Timestamp(2018, 2, 6, 0)):
        return 1

df['FD'] = df['Timestamp'].apply(lambda fault_date: fault_mapper_FD(fault_date))

cond = (df.Timestamp.dt.time > dt.time(22,0)) | ((df.Timestamp.dt.time < dt.time(7,0)))
df[cond] = df[cond].fillna(0,axis=1)

When I try to interpolate当我尝试插值时

**df.interpolate(method ='linear', limit_direction ='backward', inplace=True)**

I get the error"我得到错误”

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 df.interpolate(method ='linear', limit_direction ='backward', inplace=True)

File ~\anaconda3\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~\anaconda3\lib\site-packages\pandas\core\frame.py:10931, in DataFrame.interpolate(self, method, axis, limit, inplace, limit_direction, limit_area, downcast, **kwargs)
  10919 @deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "method"])
  10920 def interpolate(
  10921     self: DataFrame,
   (...)
  10929     **kwargs,
  10930 ) -> DataFrame | None:
> 10931     return super().interpolate(
  10932         method,
  10933         axis,
  10934         limit,
  10935         inplace,
  10936         limit_direction,
  10937         limit_area,
  10938         downcast,
  10939         **kwargs,
  10940     )

File ~\anaconda3\lib\site-packages\pandas\core\generic.py:7034, in NDFrame.interpolate(self, method, axis, limit, inplace, limit_direction, limit_area, downcast, **kwargs)
   7028 if isna(index).any():
   7029     raise NotImplementedError(
   7030         "Interpolation with NaNs in the index "
   7031         "has not been implemented. Try filling "
   7032         "those NaNs before interpolating."
   7033     )
-> 7034 new_data = obj._mgr.interpolate(
   7035     method=method,
   7036     axis=axis,
   7037     index=index,
   7038     limit=limit,
   7039     limit_direction=limit_direction,
   7040     limit_area=limit_area,
   7041     inplace=inplace,
   7042     downcast=downcast,
   7043     **kwargs,
   7044 )
   7046 result = self._constructor(new_data)
   7047 if should_transpose:

File ~\anaconda3\lib\site-packages\pandas\core\internals\managers.py:359, in BaseBlockManager.interpolate(self, **kwargs)
    358 def interpolate(self: T, **kwargs) -> T:
--> 359     return self.apply("interpolate", **kwargs)

File ~\anaconda3\lib\site-packages\pandas\core\internals\managers.py:304, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, **kwargs)
    302         applied = b.apply(f, **kwargs)
    303     else:
--> 304         applied = getattr(b, f)(**kwargs)
    305 except (TypeError, NotImplementedError):
    306     if not ignore_failures:

File ~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py:1482, in EABackedBlock.interpolate(self, method, axis, inplace, limit, fill_value, **kwargs)
   1480     new_values = values.T.fillna(value=fill_value, method=method, limit=limit).T
   1481 else:
-> 1482     new_values = values.fillna(value=fill_value, method=method, limit=limit)
   1483 return self.make_block_same_class(new_values)

File ~\anaconda3\lib\site-packages\pandas\core\arrays\_mixins.py:300, in NDArrayBackedExtensionArray.fillna(self, value, method, limit)
    296 @doc(ExtensionArray.fillna)
    297 def fillna(
    298     self: NDArrayBackedExtensionArrayT, value=None, method=None, limit=None
    299 ) -> NDArrayBackedExtensionArrayT:
--> 300     value, method = validate_fillna_kwargs(
    301         value, method, validate_scalar_dict_value=False
    302     )
    304     mask = self.isna()
    305     # error: Argument 2 to "check_value_size" has incompatible type
    306     # "ExtensionArray"; expected "ndarray"

File ~\anaconda3\lib\site-packages\pandas\util\_validators.py:378, in validate_fillna_kwargs(value, method, validate_scalar_dict_value)
    376     raise ValueError("Must specify a fill 'value' or 'method'.")
    377 elif value is None and method is not None:
--> 378     method = clean_fill_method(method)
    380 elif value is not None and method is None:
    381     if validate_scalar_dict_value and isinstance(value, (list, tuple)):

File ~\anaconda3\lib\site-packages\pandas\core\missing.py:125, in clean_fill_method(method, allow_nearest)
    123     expecting = "pad (ffill), backfill (bfill) or nearest"
    124 if method not in valid_methods:
--> 125     raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
    126 return method

ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

I have tried also the following solution found in stackoverflow but did not help我也尝试过在stackoverflow中找到的以下解决方案,但没有帮助

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')

import pandas as pd导入 pandas 作为 pd

df = pd.DataFrame({'time': pd.to_datetime(['2010', '2011', 'foo', '2012', '2013'], 
                                          errors='coerce')})
df['time'] = df.time.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata')
df.interpolate()

By default, df.interpolate(method='linear') forward-fills NaNs after the last valid value.默认情况下,df.interpolate(method='linear') 在最后一个有效值之后前向填充 NaN。 That is rather surprising given that the method name only mentions "interpolate".这是相当令人惊讶的,因为方法名称只提到了“插值”。

To restrict df.interpolate to only interpolate NaNs between valid (non-NaN) values, as of Pandas version 0.23.0 (Reference), use limit_area='inside'.要限制 df.interpolate 仅在有效(非 NaN)值之间插入 NaN,从 Pandas 版本 0.23.0(参考)开始,请使用 limit_area='inside'。

import pandas as pd
import numpy as np
a = pd.DataFrame({'col1': [np.nan, 1, np.nan, 3, np.nan, 5, np.nan]})
a['linear'] = a.interpolate(method='linear')['col1']
a['linear inside'] = a.interpolate(method='linear', limit_area='inside')['col1']
print(a)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM