Pandas - 如何识别系列中的“nan”值

Question

I am currently playing with Kaggle Titanic dataset (train.csv)我目前正在玩 Kaggle 泰坦尼克号数据集 (train.csv)

I can load the data fine.我可以很好地加载数据。
I understood that some data in Embarked column has nan value.我知道Embarked列中的某些数据具有nan值。 But when I tried to filter it using the following code, I am getting an empty array但是当我尝试使用以下代码过滤它时，我得到一个空数组

    import pandas as pd
    df = df.read_csv(<file_loc>, header=0)
    df[df.Embarked == 'nan']

I tried to import numpy.nan to replace the string nan above.我尝试导入numpy.nan来替换上面的字符串nan 。 But it doesn't work.但它不起作用。

What am I trying to find - is all the cells which are not 'S', 'C', 'Q'.我要查找的是所有不是“S”、“C”、“Q”的单元格。

Also realised later that.... the nan is a Float type using type(df.Embarked.unique()[-1]) .后来也意识到.... nan是使用type(df.Embarked.unique()[-1])的 Float 类型。 Could someone help me understand how to identify those nan cells?有人可以帮助我了解如何识别那些nan细胞吗？

Answer 1

NaN is used to represent missing values. NaN用于表示缺失值。

To find them, use .isna()要找到它们，请使用.isna()

Detect missing values.检测缺失值。
To replace them, use .fillna(value)要替换它们，请使用.fillna(value)

Fill NA/NaN values填充 NA/NaN 值

Some examples on a series called col : col系列中的一些示例：

>>> col
0    1.0
1    NaN
2    2.0
dtype: float64
>>> col[col.isna()]
1   NaN
dtype: float64
>>> col.index[col.isna()]
Int64Index([1], dtype='int64')
>>> col.fillna(-1)
0    1.0
1   -1.0
2    2.0
dtype: float64

Pandas - 如何识别系列中的“nan”值

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-10-02 10:44:14

Pandas - 如何识别系列中的“nan”值

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-10-02 10:44:14

解决方案1
2 已采纳 2021-10-02 10:44:14