removing NaN values in python pandas

Question

Data is of income of adults from census data, rows look like:

 31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K 48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K

I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.

 >>> import pandas as pd >>> income = pd.read_csv('income.data') >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) >>> income.dropna(how='any') # should drop all rows with NaNs >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) Self-emp-inc, nan], dtype=object) # what?? >>> income = income.dropna(how='any') # ok, maybe reassignment will work? >>> income['type'].unique() array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov, NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) # what??

I tried with a smaller example.csv :

 label,age,sex 1,43,M -1,NaN,F 1,65,NaN

And dropna() worked just fine here for both categorical and numerical NaNs. What is going on? I'm new to Pandas, just learning the ropes.

Answer 1

As I wrote in the comment: The "NaN" has a leading whitespace (at least in the data you provided). Therefore, you need to specifiy the na_values paramter in the read_csv function.

Try this one:

 df = pd.read_csv("income.csv",header=None,na_values=" NaN")

This is why your second example works, because there is no leading whitespace here.

Answer 2

Drop all rows with NaN values

df2=df.dropna() df2=df.dropna(axis=0)

Reset index after drop

df2=df.dropna().reset_index(drop=True)

Drop row that has all NaN values

df2=df.dropna(how='all')

Drop rows that has NaN values on selected columns

df2=df.dropna(subset=['length','Height'])

removing NaN values in python pandas

Question

2 answers

solution1
8 ACCPTED 2013-11-18 17:43:34

solution2
-1 2021-03-18 19:51:08

Drop all rows with NaN values

Reset index after drop

Drop row that has all NaN values

Drop rows that has NaN values on selected columns

removing NaN values in python pandas

Question

2 answers

solution1 8 ACCPTED 2013-11-18 17:43:34

solution2 -1 2021-03-18 19:51:08

Drop all rows with NaN values

Reset index after drop

Drop row that has all NaN values

Drop rows that has NaN values on selected columns

solution1
8 ACCPTED 2013-11-18 17:43:34

solution2
-1 2021-03-18 19:51:08