[英]How to remove DataFrame rows where a column's values are in a set?
I have a set 我有一套
remove_set
I want to remove all rows in a dataframe where a column value is in that set. 我想删除数据框中列值在该集合中的所有行。
df = df[df.column_in_set not in remove_set]
This gives me the error: 这给了我错误:
'Series' objects are mutable, thus they cannot be hashed.
What is the most pandas/pythonic way to solve this problem? 什么是解决这个问题的最大熊猫/ pythonic方法? I could iterate through the rows and figure out the the ilocs to exclude, but that seems a little inelegant. 我可以遍历行并找出要排除的ilocs,但这看起来有点不雅。
Some sample input and expected output. 一些样本输入和预期输出。
Input: 输入:
column_in_set value_2 value_3
1 'a' 3
2 'b' 4
3 'c' 5
4 'd' 6
remove = set([2,4])
Output: 输出:
column_in_set value_2 value_3
1 'a' 3
3 'c' 5
To make the selection you can write: 要进行选择,您可以写:
df[~df['column_in_set'].isin(remove)]
isin()
simply checks if each value of the column/Series is in a set (or list or other iterable), returning a boolean Series. isin()
只是检查列/ Series的每个值是否在一个集合(或列表或其他可迭代)中,返回一个布尔系列。
In this case, we want to only include rows of the DataFrame which are not in remove
so we invert the boolean values with ~
and use then this to index the DataFrame. 在这种情况下,我们只想包含未 remove
的DataFrame行,因此我们用~
反转布尔值,然后使用它来索引DataFrame。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.