I have a 12064 rows x 220 columns csv file with some null values written in the form '\\N'. I'm reading the csv data into a pandas dataframe via: df = pd.read_csv('my_csv')
What is the best way to treat/remove the null values so that I can perform downstream analysis on the data? I'm thinking perhaps it might be best to convert the '\\N' string to 'NaN' and use the df.dropna() method. If this is the best option, how would I do this? Thanks.
I'm thinking perhaps it might be best to convert the '\\N' string to 'NaN' and use the df.dropna() method
pandas.read_csv()
has a parameter just for that
na_values : list-like or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values
So, eg
pandas.read_csv('my.csv', na_values=['\N'])
Then you can easily use dropna()
on it
ref, http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.