删除 python pandas 中的 NaN 值

Question

Data is of income of adults from census data, rows look like:数据是来自人口普查数据的成年人收入，行如下所示：

 31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K 48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K

I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.我正在尝试从 pandas 中的 CSV 文件加载的 DataFrame 中删除所有带有 NaN 的行。

 >>> import pandas as pd >>> income = pd.read_csv('income.data') >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) >>> income.dropna(how='any') # should drop all rows with NaNs >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) Self-emp-inc, nan], dtype=object) # what?? >>> income = income.dropna(how='any') # ok, maybe reassignment will work? >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) # what??

I tried with a smaller example.csv :我尝试了一个较小的示例。 example.csv ：

 label,age,sex 1,43,M -1,NaN,F 1,65,NaN

And dropna() worked just fine here for both categorical and numerical NaNs.而且dropna()在这里对于分类和数值 NaN 都可以正常工作。 What is going on?到底是怎么回事？ I'm new to Pandas, just learning the ropes.我是 Pandas 的新手，只是在学习。

Answer 1

As I wrote in the comment: The "NaN" has a leading whitespace (at least in the data you provided).正如我在评论中所写：“NaN”有一个前导空格（至少在您提供的数据中）。 Therefore, you need to specifiy the na_values paramter in the read_csv function.因此，您需要在read_csv function 中指定na_values参数。

Try this one:试试这个：

 df = pd.read_csv("income.csv",header=None,na_values=" NaN")

This is why your second example works, because there is no leading whitespace here.这就是您的第二个示例有效的原因，因为这里没有前导空格。

Answer 2

Drop all rows with NaN values删除所有具有 NaN 值的行

df2=df.dropna() df2=df.dropna(axis=0)

Reset index after drop删除后重置索引

df2=df.dropna().reset_index(drop=True)

Drop row that has all NaN values删除具有所有 NaN 值的行

df2=df.dropna(how='all')

Drop rows that has NaN values on selected columns删除在选定列上具有 NaN 值的行

df2=df.dropna(subset=['length','Height'])

删除 python pandas 中的 NaN 值

问题描述

2 个解决方案

解决方案1
8 已采纳 2013-11-18 17:43:34

解决方案2
-1 2021-03-18 19:51:08

Drop all rows with NaN values删除所有具有 NaN 值的行

Reset index after drop删除后重置索引

Drop row that has all NaN values删除具有所有 NaN 值的行

Drop rows that has NaN values on selected columns删除在选定列上具有 NaN 值的行

删除 python pandas 中的 NaN 值

问题描述

2 个解决方案

解决方案1 8 已采纳 2013-11-18 17:43:34

解决方案2 -1 2021-03-18 19:51:08

Drop all rows with NaN values删除所有具有 NaN 值的行

Reset index after drop删除后重置索引

Drop row that has all NaN values删除具有所有 NaN 值的行

Drop rows that has NaN values on selected columns删除在选定列上具有 NaN 值的行

解决方案1
8 已采纳 2013-11-18 17:43:34

解决方案2
-1 2021-03-18 19:51:08