简体   繁体   中英

How to remove '\N' null values from csv from a pandas dataframe

I have a 12064 rows x 220 columns csv file with some null values written in the form '\\N'. I'm reading the csv data into a pandas dataframe via: df = pd.read_csv('my_csv')

What is the best way to treat/remove the null values so that I can perform downstream analysis on the data? I'm thinking perhaps it might be best to convert the '\\N' string to 'NaN' and use the df.dropna() method. If this is the best option, how would I do this? Thanks.

I'm thinking perhaps it might be best to convert the '\\N' string to 'NaN' and use the df.dropna() method

pandas.read_csv() has a parameter just for that

na_values : list-like or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values

So, eg

pandas.read_csv('my.csv', na_values=['\N'])

Then you can easily use dropna() on it

ref, http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM