简体   繁体   English

根据可变数量的条件过滤数据框

[英]Filtering dataframe based on variable number of conditions

I have a dataframe like this for example:例如,我有一个这样的数据框:

df = pd.DataFrame({'A':['a', 'a', 'b', 'c', 'a', 'b',], 'B': [1, 2, 3, 4, 5, 6,]})

What I need is to filter the df based on the value in column 'A'.我需要的是根据“A”列中的值过滤 df。 The problem is that the values to filter by are supplied by the end user.问题是要过滤的值是由最终用户提供的。 For example:例如:

cond = ['a', 'b']

means that the user wants to filter the df and keep all values 'a' and 'b' in the column 'A'.意味着用户想要过滤 df 并将所有值 'a' 和 'b' 保留在列 'A' 中。 So in this case I'll need to filter the df with this condition:因此,在这种情况下,我需要使用以下条件过滤 df:

df = df.loc[(df['A'] == 'a') | (df['A'] == 'b')]

But the next time the values in the cond list can be different and I need to account for it.但是下次cond列表中的值可能会有所不同,我需要考虑到这一点。 So far I've tried the for loop.到目前为止,我已经尝试过for循环。 I was pretty certain it wasn't going to work... and it didn't:我很确定它不会起作用......而且它没有:

for item in cond:
    df = df.loc[df['A'] == item]

I've also tried to create a generator under df.query() and had high hopes for this, but it didn't work either.我还尝试在df.query()下创建一个生成器,并对此寄予厚望,但它也没有奏效。 Unfortunately, the method doesn't accept generators:不幸的是,该方法不接受生成器:

df = df.query(f'A == {x}' for x in cond)
# or
df = df.query('A == @x' for x in cond)

Not quite sure what else to try.不太确定还有什么要尝试的。 Has anyone dealt with this type of problem before?有没有人处理过这种类型的问题?

你可以试试

df = df.loc[df['A'].isin(cond)]

也可以尝试替代@BEN_YO

 df.query('A==@cond')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM