[英]How to convert 'NaN' strings in a pandas Series to null values for dropna?
I tried a couple methods to clean rows containing NaN
from a particular Series in my DataFrame only to realize every NaN
entry is a 'NaN'
string, not a null value. 我尝试了几种方法来清除DataFrame中特定系列中包含NaN
行,只是意识到每个NaN
条目都是'NaN'
字符串,而不是空值。
In my specific example, each row represents a country and so I want to remove all countries that do not have a GDP value in the 'GDP per Capita'
column from the DataFrame. 在我的特定示例中,每一行代表一个国家,因此我想从DataFrame中删除'GDP per Capita'
列中没有GDP值的所有国家。
Some things I tried (that failed): 我尝试过的一些事情(失败了):
df_noGDP = df
df_noGDP.dropna(axis=0, subset=['GDP per Capita'])
and 和
df_noGDP = df.loc[df['GDP per Capita'] != np.nan]
When I call df_noGDP
, I see that no NaN
values are removed. 当我调用df_noGDP
,我看到没有删除NaN
值。 I figure I'm either making a silly syntax error somewhere or I need to convert my data types. 我认为我在某个地方犯了一个愚蠢的语法错误,或者我需要转换我的数据类型。
Do: 做:
df_noGDP=df_noGDP.replace('NaN',np.nan)
Or: 要么:
df_noGDP.replace('NaN','np.nan,inplace=1)
Then your stuff would work as expected. 然后您的东西将按预期工作。
First convert your strings to NaN
values: 首先将您的字符串转换为NaN
值:
df = df.replace('NaN', np.nan)
Then assign back or specify your method to be in-place: 然后分配回去或指定您的方法就位:
df = df.dropna(subset=['GDP per Capita']) # not in place version
df.dropna(subset=['GDP per Capita'], inplace=True) # in place version
Alternatively, use loc
with notnull
, since NaN != NaN
by design : 或者,将loc
与notnull
一起notnull
,因为NaN != NaN
是设计 notnull
:
df = df.loc[df['GDP per Capita'].notnull()]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.