简体   繁体   English

如何根据条件删除python数据框中的行?

[英]how to remove rows in python data frame with condition?

I have the following data:我有以下数据:
df = df =

   Emp_Name   Leaves   Leave_Type   Salary   Performance
0  Christy      20      sick        3000.0    56.6
1  Rocky        10      Casual      kkkk      22.4
2  jenifer      50      Emergency   2500.6   '51.6'
3  Tom          10      sick        Nan       46.2
4  Harry        nn      Casual      1800.1   '58.3'
5  Julie        22      sick        3600.2   'unknown'
6  Sam          5       Casual      Nan       47.2
7  Mady         6       sick        unknown   Nan

Output:输出:

   Emp_Name   Leaves   Leave_Type   Salary   Performance
0  Christy      20      sick        3000.0    56.6
1  jenifer      50      Emergency   2500.6    51.6
2  Tom          10      sick        Nan       46.2
3  Sam          5       Casual      Nan       47.2
4  Mady         6       sick        unknown   Nan

I want to delete records where there is datatype error in numerical columns(Leaves,Salary,Performance).我想删除数值列(Leaves、Salary、Performance)中存在数据类型错误的记录。
If numerical columns contains strings then that row show be deleted from data frame?如果数字列包含字符串,那么该行显示会从数据框中删除吗?

df[['Leaves','Salary','Performance']].apply(pd.to_numeric, errors = 'coerce')

but this will covert values to Nan.但这会将值隐藏到 Nan 中。

Let's start from a note concerning your sample data:让我们从有关您的示例数据的注释开始:

It contains Nan strings, which are not among strings automatically recognized as NaN s.它包含Nan字符串,这些字符串不在自动识别为NaN的字符串中。 To treat them as NaN , I read the source text with read_fwf , passing na_values=['Nan'] .要将它们视为NaN ,我使用read_fwf读取源文本,传递na_values=['Nan']

And now get down to the main task:现在开始主要任务:

Define a function to check whether a cell is acceptable:定义一个函数来检查一个单元格是否可以接受:

def isAcceptable(cell):
    if pd.isna(cell) or cell == 'unknown':
        return True
    return all(c.isdigit() or c == '.' for c in cell)

I noticed that you accept NaN values.我注意到您接受NaN值。 You also a cell if it contains only unknown string, but you don't accept a cell if such word is enclosed between eg quotes.你也是一个细胞,如果它包含未知的字符串,但如果这样的话是如引号引起来,你不接受一个细胞。

If you change your mind about what is / is not acceptable, change the above function accordingly.如果您改变了什么是/不可接受的想法,请相应地更改上述功能。

Then, to leave only rows with all acceptable values in all 3 mentioned columns, run:然后,要在所有 3 个提到的列中只留下具有所有可接受值的行,请运行:

df[df[['Leaves', 'Salary', 'Performance']].applymap(isAcceptable).all(axis=1)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM