按允许的列值组合列表过滤 df

Question

Say I have a dataframe with like this:假设我有一个这样的数据框：

   Animal  Color
0     Dog  White
1     Cat  Black
2     Dog  Black
3     Dog  Brown
4  Rabbit  Brown

And I want to get all indices which match these tuples: [('Cat', 'Black'), ('Dog', 'Brown')] .我想获得与这些元组匹配的所有索引： [('Cat', 'Black'), ('Dog', 'Brown')] 。 So that would be [1,3] in this case.所以在这种情况下就是[1,3] 。

I can't do something like df[np.isin(df['Animal'], ['Cat', 'Dog']) & np.isin(df['Color'], ['Black', 'Brown'])] because that would give me [1,2,3]我不能做这样的事情df[np.isin(df['Animal'], ['Cat', 'Dog']) & np.isin(df['Color'], ['Black', 'Brown'])]因为那会给我[1,2,3]

If this were just one column I would use df[np.isin(df[col], ls)] .如果这只是一列，我会使用df[np.isin(df[col], ls)] 。

If I only cared about one tuple I could have done df[(df[col0] == tup[0]) & (df[col1] == tup[1])]如果我只关心一个元组，我可以做df[(df[col0] == tup[0]) & (df[col1] == tup[1])]

I just don't know how to combine the two concepts.我只是不知道如何将这两个概念结合起来。

Answer 1

Here is a way with the pandas MultiIndex.这是使用 Pandas MultiIndex 的一种方法。 I changed the example to have a red dog:我将示例更改为有一只红狗：

from io import StringIO
import pandas as pd

data = '''   Animal  Color
0     Dog  White
1     Cat  Black
2     Dog  Red
3     Dog  Brown
4  Rabbit  Brown
'''
df = pd.read_csv(StringIO(data), sep='\s+', engine='python', index_col=0)

to_keep = [('Cat', 'Black'), 
           ('Dog', 'Red'),
          ]

mask = pd.MultiIndex.from_frame(df[['Animal', 'Color']]).isin(to_keep)

print(df.loc[mask])

  Animal  Color
1    Cat  Black
2    Dog    Red

Answer 2

Let's try broadcasting:让我们尝试广播：

mask = (df.values[:,None,:] == np.array(a)).all(-1).any(-1)

df[mask]

Output:输出：

  Animal  Color
1    Cat  Black
3    Dog  Brown

Answer 3

You can just create a Boolean series that will contain your logic as follows:您可以创建一个包含逻辑的布尔系列，如下所示：

criterion = [('Cat', 'Black'), ('Dog', 'Brown')]

cond = reduce(lambda x, y: ((df['Animal'] == x[0]) & (df['Color'] == x[1])) | ((df['Animal'] == y[0]) & (df['Color'] == y[1])), criterion)

print(df[cond])

Outputs:输出：

  Animal  Color
1    Cat  Black
3    Dog  Brown

Answer 4

You could use a for loop to pull the indices :您可以使用for loop来拉索引：

df.loc[[ind 
        for ind, a, b in zip(df.index, df.Animal, df.Color)
        if (a, b) in keep]]

Animal  Color
1   Cat Black
3   Dog Brown

If the indices are not important you could set index and reset index :如果索引不重要，您可以set index并reset index ：

df.set_index(["Animal", "Color"], append=False, drop=False).loc[keep].reset_index(
    drop=True
)

  Animal    Color
0   Cat     Black
1   Dog     Brown

按允许的列值组合列表过滤 df

问题描述

4 个解决方案

解决方案1
2 已采纳 2020-08-26 20:45:11

解决方案2
1 2020-08-26 20:36:10

解决方案3
0 2020-08-26 21:00:03

解决方案4
0 2020-08-26 22:16:51

按允许的列值组合列表过滤 df

问题描述

4 个解决方案

解决方案1 2 已采纳 2020-08-26 20:45:11

解决方案2 1 2020-08-26 20:36:10

解决方案3 0 2020-08-26 21:00:03

解决方案4 0 2020-08-26 22:16:51

解决方案1
2 已采纳 2020-08-26 20:45:11

解决方案2
1 2020-08-26 20:36:10

解决方案3
0 2020-08-26 21:00:03

解决方案4
0 2020-08-26 22:16:51