简体   繁体   English

在具有特定条件的pandas中删除行

[英]Dropping rows in pandas with a certain condition

I have a list of ID's and a dataframe, where one of the columns is ID. 我有一个ID列表和一个数据帧,其中一列是ID。 I want to drop all rows in the dataframe where the ID is not one of the ID's in the list of ID's. 我想删除数据框中的所有行,其中ID不是ID列表中的ID之一。 This is the code I use: 这是我使用的代码:

df = df.drop(df[df.ID not in list_IDs].index)

but I get this error message: 但我收到此错误消息:

ValueError: The truth value of a Series is ambiguous. ValueError:Series的真值是不明确的。 Use a.empty, a.bool(), a.item(), a.any() or a.all(). 使用a.empty,a.bool(),a.item(),a.any()或a.all()。

What am I doing wrong? 我究竟做错了什么?

try this: 尝试这个:

df.ix[~df.ID.isin(list_IDs)]

Explanation 说明

constructions like df.ID not in list_IDs won't work even in vanilla Python: 不像df.ID not in list_IDs类的df.ID not in list_IDs即使在vanilla Python中也不会起作用:

In [12]: [1,2,3] in [1,2,3]
Out[12]: False

In [13]: [1,2] in [1,2,3]
Out[13]: False

In pandas you want to use .isin() function 在pandas中你想使用.isin()函数

Data: 数据:

In [14]: list_IDs
Out[14]: [24, 12, 42, 44]

In [15]: df
Out[15]:
   ID   A
0  58  69
1  36  63
2  92  43
3  24  37
4  12  54
5  42   0
6  44  57
7  78  59
8  59  85
9  56  84

Demo 演示

In [16]: df.ID.isin(list_IDs)
Out[16]:
0    False
1    False
2    False
3     True
4     True
5     True
6     True
7    False
8    False
9    False
Name: ID, dtype: bool

In [17]: df[df.ID.isin(list_IDs)]
Out[17]:
   ID   A
3  24  37
4  12  54
5  42   0
6  44  57

Negative isin() 负面的 isin()

In [18]: df[~df.ID.isin(list_IDs)]
Out[18]:
   ID   A
0  58  69
1  36  63
2  92  43
7  78  59
8  59  85
9  56  84

In [19]: ~df.ID.isin(list_IDs)
Out[19]:
0     True
1     True
2     True
3    False
4    False
5    False
6    False
7     True
8     True
9     True
Name: ID, dtype: bool

Check out the answer from unutbu at Evaluating pandas series values with logical expressions and if-statements . 使用逻辑表达式和if语句查看来自评估pandas系列值的 unutbu的答案。 Basically, pandas always raises an error if you try to evaluate TRUE/FALSE by comparing the array to a list because it is not clear whether the user expects TRUE to be returned iff all values in the series match or TRUE if more than one value in the series matches. 基本上,如果您尝试通过将数组与列表进行比较来尝试评估TRUE / FALSE,则pandas总是会引发错误,因为如果系列中的所有值都匹配,则用户是否期望返回TRUE,如果系统中的多个值匹配则返回TRUE系列赛。 Hence, specific functions such as .any and .all must be used instead. 因此,如特定功能.any.all必须被代替使用。

Addition: Why does array < 5 work then? 另外:为什么array < 5工作呢? It's because there is no ambiguity. 这是因为没有歧义。 All the values in the array are compared elementwise to 5. If it was array == [5,6] then it's not clear whether True or False is expected. 数组中的所有值都按元素比较为5.如果是array == [5,6]则不清楚是否需要TrueFalse It is equal to the first element but not the second. 它等于第一个元素,但不是第二个元素。 In some circumstances, you would want True and in others, you would want False . 在某些情况下,你会想要True而在其他情况下,你会想要False To get around the ambiguity, users are expected to use specific functions such as .any . 为避免歧义,用户需要使用特定的功能,如.any

import pandas as pd
x = pd.Series([1,2,3])

Now, think about how you expect python to evaluate this 现在,想想你期望python如何评估它

(x in [1,2])

or more directly 或更直接

pd.Series([1,2,3]) in [1,2]

As you can see 如你看到的

"ValueError: The truth value of a Series is ambiguous" “ValueError:系列的真值是模棱两可的”

What you are looking to do is this 你要做的就是这个

x.isin([1,2])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM