我该如何更换<NA>在 Pandas DataFrame 中使用 NaN？

Question

Some columns in my DataFrame have instances of <NA> which are of type pandas._libs.missing.NAType .我的 DataFrame 中的某些列具有类型为pandas._libs.missing.NAType的<NA>实例。

I'd like to replace them with NaN using np.nan .我想使用np.nan将它们替换为NaN 。

I have seen questions where the instances of <NA> can be replaced when using pd.read_csv() .我已经看到使用pd.read_csv()时可以替换<NA>实例的问题。

But since my Pandas DataFrame is created from a Spark DataFrame I do not use the pd.read_csv() function.但由于我的 Pandas DataFrame 是从 Spark DataFrame 创建的，因此我不使用pd.read_csv()函数。

Please Advise.请指教。

Answer 1

Use replace , but also is necessary upgrade pandas.使用replace ，也是升级 pandas 的必要条件。

df = pd.DataFrame({'age':[pd.NA, 4, 8]})

df = df.replace(pd.NA, np.nan)
print (df)
   age
0  NaN
1  4.0
2  8.0

Answer 2

我对replace解决方案没有任何运气，但能够通过将列转换为 float - df['my_col'].astype(float)将<NA>转换为 np.nan 。

Answer 3

Using Pandas v1.3.1 and Numpy v1.20.3 you can use df.where() which do the replace when the condition is False like below:使用Pandas v1.3.1 v1.3.1 和Numpy v1.20.3 ，您可以使用df.where()在条件为False时进行替换，如下所示：

$> df = pd.DataFrame({'age':[pd.NA, 4, 8]})
$> print(df)
    age
0  <NA>
1     4
2     8
$> print(type(df.iloc[0]['age']))
   pandas._libs.missing.NAType
$> df = df.where(pd.notnull(df), np.nan)  # Replace pd.NA, np.nan and None by np.nan
$> print(df)
   age
0  NaN
1    4
2    8
$> print(type(df.iloc[0]['age']))
   float

PS: You do also: PS：你也这样做：

$> df = df.where(~pd.isna(df), np.nan)

我该如何更换<NA>在 Pandas DataFrame 中使用 NaN？

问题描述

3 个解决方案

解决方案1
0 2021-09-27 09:47:22

解决方案2
0 2022-07-19 18:24:37

解决方案3
0 2022-07-19 18:36:38

我该如何更换<NA>在 Pandas DataFrame 中使用 NaN？

问题描述

3 个解决方案

解决方案1 0 2021-09-27 09:47:22

解决方案2 0 2022-07-19 18:24:37

解决方案3 0 2022-07-19 18:36:38

解决方案1
0 2021-09-27 09:47:22

解决方案2
0 2022-07-19 18:24:37

解决方案3
0 2022-07-19 18:36:38