简体   繁体   中英

removing NaN values in python pandas

Data is of income of adults from census data, rows look like:

 31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K 48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K

I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.

 >>> import pandas as pd >>> income = pd.read_csv('income.data') >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) >>> income.dropna(how='any') # should drop all rows with NaNs >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) Self-emp-inc, nan], dtype=object) # what?? >>> income = income.dropna(how='any') # ok, maybe reassignment will work? >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) # what??

I tried with a smaller example.csv :

 label,age,sex 1,43,M -1,NaN,F 1,65,NaN

And dropna() worked just fine here for both categorical and numerical NaNs. What is going on? I'm new to Pandas, just learning the ropes.

As I wrote in the comment: The "NaN" has a leading whitespace (at least in the data you provided). Therefore, you need to specifiy the na_values paramter in the read_csv function.

Try this one:

 df = pd.read_csv("income.csv",header=None,na_values=" NaN")

This is why your second example works, because there is no leading whitespace here.

Drop all rows with NaN values

df2=df.dropna() df2=df.dropna(axis=0)

Reset index after drop

df2=df.dropna().reset_index(drop=True)

Drop row that has all NaN values

df2=df.dropna(how='all')

Drop rows that has NaN values on selected columns

df2=df.dropna(subset=['length','Height'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM