How to remove '\N' null values from csv from a pandas dataframe

Question

I have a 12064 rows x 220 columns csv file with some null values written in the form '\\N'. I'm reading the csv data into a pandas dataframe via: df = pd.read_csv('my_csv')

What is the best way to treat/remove the null values so that I can perform downstream analysis on the data? I'm thinking perhaps it might be best to convert the '\\N' string to 'NaN' and use the df.dropna() method. If this is the best option, how would I do this? Thanks.

Answer 1

I'm thinking perhaps it might be best to convert the '\\N' string to 'NaN' and use the df.dropna() method

pandas.read_csv() has a parameter just for that

na_values : list-like or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values

So, eg

pandas.read_csv('my.csv', na_values=['\N'])

Then you can easily use dropna() on it

ref, http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

How to remove '\N' null values from csv from a pandas dataframe

Question

1 answers

solution1
2 2015-06-20 16:50:21

How to remove '\N' null values from csv from a pandas dataframe

Question

1 answers

solution1 2 2015-06-20 16:50:21

solution1
2 2015-06-20 16:50:21