简体   繁体   English

dropna() 函数的问题和 dropna() 的替代方案

[英]Issue with dropna() function and alternatives to the dropna()

I was learning to use the dropna() function in Python, in order to drop rows/columns which contained NaN/'?'我正在学习在 Python 中使用 dropna() 函数,以便删除包含 NaN/'?' 的行/列。 values in them.其中的价值观。 However, even after seeing various solutions online, I couldn't drop data in spite of getting no syntactical errors.但是,即使在网上看到各种解决方案后,尽管没有出现语法错误,我还是无法删除数据。

I've tried the following solutions:我尝试了以下解决方案:

First Attempt第一次尝试

df1 = df.dropna()
df1

Continued继续

df1.dropna(inplace=1)
df1

The first part of the code gave me the original data frame代码的第一部分给了我原始数据框

The second part gave me the following error:第二部分给了我以下错误:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 df1.dropna(inplace=1) 2 3 df1 -------------------------------------------------- ------------------------- ValueError Traceback (最近一次调用最后一次) in () ----> 1 df1.dropna(inplace=1) 2 3 df1

~\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py in dropna(self, axis, how, thresh, subset, inplace) 4259 1 Batman Batmobile 1940-04-25 4260 """ -> 4261 inplace = validate_bool_kwarg(inplace, 'inplace') 4262 if isinstance(axis, (tuple, list)): 4263 # GH20987 ~\\Anaconda3\\lib\\site-packages\\pandas\\core\\frame.py in dropna(self,axis,how,thresh,subset,inplace) 4259 1 Batman Batmobile 1940-04-25 4260 """ -> 4261 inplace = validate_bool_kwarg(inplace, 'inplace') 4262 if isinstance(axis, (tuple, list)): 4263 # GH20987

~\\Anaconda3\\lib\\site-packages\\pandas\\util_validators.py in validate_bool_kwarg(value, arg_name) 224 raise ValueError('For argument "{arg}" expected type bool, received ' 225 'type {typ}.'.format(arg=arg_name, --> 226 typ=type(value). name )) 227 return value 228 ~\\Anaconda3\\lib\\site-packages\\pandas\\util_validators.py in validate_bool_kwarg(value, arg_name) 224 raise ValueError('For argument "{arg}" expected type bool, received ' 225 'type {typ}.'.format (ARG = arg_name, - > 226典型值=类型(值)的名称。))227返回值228

ValueError: For argument "inplace" expected type bool, received type ValueError:对于参数“就地”预期类型 bool,接收类型

Further, is there any better alternatives to dropna() function?此外,有没有更好的替代 dropna() 函数的方法?


EDIT 1编辑 1

  1. Link to my Python notebook Dealing with Missing Data.ipynb链接到我的 Python 笔记本处理缺失数据.ipynb
  2. I tried to change the argument value for inplace to True, but it gives me the following error:我试图将 inplace 的参数值更改为 True,但它给了我以下错误:

NameError: name 'df1' is not defined NameError: 名称 'df1' 未定义

PS All the errors and issues are visible in the code PS 所有的错误和问题都在代码中可见

LINK TO THE CSV FILE USED = CSV所用 CSV 文件的链接 = CSV


Firstly replace ?首先更换 ? with nan, like this:与 nan,像这样:

df.replace('?', np.nan)

Then drop all the missing values using dropna (the nan's you just replaced above, like this:然后使用dropna删除所有缺失值(您刚刚在上面替换的 nan,如下所示:

df1 = df.dropna()
df1

and then use inplace to keep the DataFrame with valid entries in the same variable, like this:然后用inplace跟上有效的条目数据帧在同一个变量,就像这样:

df1.dropna(inplace=True)
df1

您还应该将inplace = True添加到替换功能

df.replace("?", np.nan, inplace = True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM