[英]removing NaN values in python pandas
Data is of income of adults from census data, rows look like:数据是来自人口普查数据的成年人收入,行如下所示:
31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K 48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K
I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.我正在尝试从 pandas 中的 CSV 文件加载的 DataFrame 中删除所有带有 NaN 的行。
>>> import pandas as pd >>> income = pd.read_csv('income.data') >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) >>> income.dropna(how='any') # should drop all rows with NaNs >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) Self-emp-inc, nan], dtype=object) # what?? >>> income = income.dropna(how='any') # ok, maybe reassignment will work? >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) # what??
I tried with a smaller example.csv
:我尝试了一个较小的示例。
example.csv
:
label,age,sex 1,43,M -1,NaN,F 1,65,NaN
And dropna()
worked just fine here for both categorical and numerical NaNs.而且
dropna()
在这里对于分类和数值 NaN 都可以正常工作。 What is going on?到底是怎么回事? I'm new to Pandas, just learning the ropes.
我是 Pandas 的新手,只是在学习。
As I wrote in the comment: The "NaN" has a leading whitespace (at least in the data you provided).正如我在评论中所写:“NaN”有一个前导空格(至少在您提供的数据中)。 Therefore, you need to specifiy the
na_values
paramter in the read_csv
function.因此,您需要在
read_csv
function 中指定na_values
参数。
Try this one:试试这个:
df = pd.read_csv("income.csv",header=None,na_values=" NaN")
This is why your second example works, because there is no leading whitespace here.这就是您的第二个示例有效的原因,因为这里没有前导空格。
df2=df.dropna() df2=df.dropna(axis=0)
df2=df.dropna().reset_index(drop=True)
df2=df.dropna(how='all')
df2=df.dropna(subset=['length','Height'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.