简体   繁体   English

按允许的列值组合列表过滤 df

[英]Filter df by list of allowable combinations of column values

Say I have a dataframe with like this:假设我有一个这样的数据框:

   Animal  Color
0     Dog  White
1     Cat  Black
2     Dog  Black
3     Dog  Brown
4  Rabbit  Brown

And I want to get all indices which match these tuples: [('Cat', 'Black'), ('Dog', 'Brown')] .我想获得与这些元组匹配的所有索引: [('Cat', 'Black'), ('Dog', 'Brown')] So that would be [1,3] in this case.所以在这种情况下就是[1,3]

I can't do something like df[np.isin(df['Animal'], ['Cat', 'Dog']) & np.isin(df['Color'], ['Black', 'Brown'])] because that would give me [1,2,3]我不能做这样的事情df[np.isin(df['Animal'], ['Cat', 'Dog']) & np.isin(df['Color'], ['Black', 'Brown'])]因为那会给我[1,2,3]

If this were just one column I would use df[np.isin(df[col], ls)] .如果这只是一列,我会使用df[np.isin(df[col], ls)]

If I only cared about one tuple I could have done df[(df[col0] == tup[0]) & (df[col1] == tup[1])]如果我只关心一个元组,我可以做df[(df[col0] == tup[0]) & (df[col1] == tup[1])]

I just don't know how to combine the two concepts.我只是不知道如何将这两个概念结合起来。

Here is a way with the pandas MultiIndex.这是使用 Pandas MultiIndex 的一种方法。 I changed the example to have a red dog:我将示例更改为有一只红狗:

from io import StringIO
import pandas as pd

data = '''   Animal  Color
0     Dog  White
1     Cat  Black
2     Dog  Red
3     Dog  Brown
4  Rabbit  Brown
'''
df = pd.read_csv(StringIO(data), sep='\s+', engine='python', index_col=0)

to_keep = [('Cat', 'Black'), 
           ('Dog', 'Red'),
          ]

mask = pd.MultiIndex.from_frame(df[['Animal', 'Color']]).isin(to_keep)

print(df.loc[mask])

  Animal  Color
1    Cat  Black
2    Dog    Red

Let's try broadcasting:让我们尝试广播:

mask = (df.values[:,None,:] == np.array(a)).all(-1).any(-1)

df[mask]

Output:输出:

  Animal  Color
1    Cat  Black
3    Dog  Brown

You can just create a Boolean series that will contain your logic as follows:您可以创建一个包含逻辑的布尔系列,如下所示:

criterion = [('Cat', 'Black'), ('Dog', 'Brown')]

cond = reduce(lambda x, y: ((df['Animal'] == x[0]) & (df['Color'] == x[1])) | ((df['Animal'] == y[0]) & (df['Color'] == y[1])), criterion)

print(df[cond])

Outputs:输出:

  Animal  Color
1    Cat  Black
3    Dog  Brown

You could use a for loop to pull the indices :您可以使用for loop来拉索引:

df.loc[[ind 
        for ind, a, b in zip(df.index, df.Animal, df.Color)
        if (a, b) in keep]]

Animal  Color
1   Cat Black
3   Dog Brown

If the indices are not important you could set index and reset index :如果索引不重要,您可以set indexreset index

df.set_index(["Animal", "Color"], append=False, drop=False).loc[keep].reset_index(
    drop=True
)

  Animal    Color
0   Cat     Black
1   Dog     Brown

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM