I tried a couple methods to clean rows containing NaN
from a particular Series in my DataFrame only to realize every NaN
entry is a 'NaN'
string, not a null value.
In my specific example, each row represents a country and so I want to remove all countries that do not have a GDP value in the 'GDP per Capita'
column from the DataFrame.
Some things I tried (that failed):
df_noGDP = df
df_noGDP.dropna(axis=0, subset=['GDP per Capita'])
and
df_noGDP = df.loc[df['GDP per Capita'] != np.nan]
When I call df_noGDP
, I see that no NaN
values are removed. I figure I'm either making a silly syntax error somewhere or I need to convert my data types.
Do:
df_noGDP=df_noGDP.replace('NaN',np.nan)
Or:
df_noGDP.replace('NaN','np.nan,inplace=1)
Then your stuff would work as expected.
First convert your strings to NaN
values:
df = df.replace('NaN', np.nan)
Then assign back or specify your method to be in-place:
df = df.dropna(subset=['GDP per Capita']) # not in place version
df.dropna(subset=['GDP per Capita'], inplace=True) # in place version
Alternatively, use loc
with notnull
, since NaN != NaN
by design :
df = df.loc[df['GDP per Capita'].notnull()]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.