[英]Pandas - How to identify `nan` values in a Series
I am currently playing with Kaggle Titanic dataset (train.csv)我目前正在玩 Kaggle 泰坦尼克号数据集 (train.csv)
Embarked
column has nan
value.Embarked
列中的某些数据具有nan
值。 But when I tried to filter it using the following code, I am getting an empty array import pandas as pd
df = df.read_csv(<file_loc>, header=0)
df[df.Embarked == 'nan']
I tried to import numpy.nan
to replace the string nan
above.我尝试导入
numpy.nan
来替换上面的字符串nan
。 But it doesn't work.但它不起作用。
What am I trying to find - is all the cells which are not 'S', 'C', 'Q'.我要查找的是所有不是“S”、“C”、“Q”的单元格。
Also realised later that.... the nan
is a Float type using type(df.Embarked.unique()[-1])
.后来也意识到....
nan
是使用type(df.Embarked.unique()[-1])
的 Float 类型。 Could someone help me understand how to identify those nan
cells?有人可以帮助我了解如何识别那些
nan
细胞吗?
NaN
is used to represent missing values. NaN
用于表示缺失值。
.isna()
.isna()
Detect missing values.
检测缺失值。
.fillna(value)
.fillna(value)
Fill NA/NaN values
填充 NA/NaN 值
Some examples on a series called col
: col
系列中的一些示例:
>>> col
0 1.0
1 NaN
2 2.0
dtype: float64
>>> col[col.isna()]
1 NaN
dtype: float64
>>> col.index[col.isna()]
Int64Index([1], dtype='int64')
>>> col.fillna(-1)
0 1.0
1 -1.0
2 2.0
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.