[英]Pandas dropna() function not working
I am trying to drop NA values from a pandas dataframe.我正在尝试从 Pandas 数据框中删除 NA 值。
I have used dropna()
(which should drop all NA rows from the dataframe).我使用了dropna()
(它应该从数据框中删除所有 NA 行)。 Yet, it does not work.然而,它不起作用。
Here is the code:这是代码:
import pandas as pd
import numpy as np
prison_data = pd.read_csv('https://andrewshinsuke.me/docs/compas-scores-two-years.csv')
That's how you get the data frame.这就是您获得数据框的方式。 As the following shows, the default read_csv
method does indeed convert the NA data points to np.nan
.如下所示,默认的read_csv
方法确实将 NA 数据点转换为np.nan
。
np.isnan(prison_data.head()['out_custody'][4])
Out[2]: True
Conveniently, the head()
of the DF already contains a NaN values (in the column out_custody
), so printing prison_data.head()
this, you get:方便的是,DF 的head()
已经包含一个 NaN 值(在列out_custody
),因此打印prison_data.head()
,您将得到:
id name first last compas_screening_date sex
0 1 miguel hernandez miguel hernandez 2013-08-14 Male
1 3 kevon dixon kevon dixon 2013-01-27 Male
2 4 ed philo ed philo 2013-04-14 Male
3 5 marcu brown marcu brown 2013-01-13 Male
4 6 bouthy pierrelouis bouthy pierrelouis 2013-03-26 Male
dob age age_cat race ...
0 1947-04-18 69 Greater than 45 Other ...
1 1982-01-22 34 25 - 45 African-American ...
2 1991-05-14 24 Less than 25 African-American ...
3 1993-01-21 23 Less than 25 African-American ...
4 1973-01-22 43 25 - 45 Other ...
v_decile_score v_score_text v_screening_date in_custody out_custody
0 1 Low 2013-08-14 2014-07-07 2014-07-14
1 1 Low 2013-01-27 2013-01-26 2013-02-05
2 3 Low 2013-04-14 2013-06-16 2013-06-16
3 6 Medium 2013-01-13 NaN NaN
4 1 Low 2013-03-26 NaN NaN
priors_count.1 start end event two_year_recid
0 0 0 327 0 0
1 0 9 159 1 1
2 4 0 63 0 1
3 1 0 1174 0 0
4 2 0 1102 0 0
However, running prison_data.dropna()
does not change the dataframe in any way.但是,运行prison_data.dropna()
不会以任何方式更改数据帧。
prison_data.dropna()
np.isnan(prison_data.head()['out_custody'][4])
Out[3]: True
df.dropna()
by default returns a new dataset without NaN
values. df.dropna()
默认返回一个没有NaN
值的新数据集。 So, you have to assign it to the variable所以,你必须将它分配给变量
df = df.dropna()
if you want it to modify the df
inplace, you have to explicitly specify如果您希望它就地修改df
,则必须明确指定
df.dropna(inplace= True)
它不起作用,因为每行至少有一个nan
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.