[英]Filter df by list of allowable combinations of column values
Say I have a dataframe with like this:假设我有一个这样的数据框:
Animal Color
0 Dog White
1 Cat Black
2 Dog Black
3 Dog Brown
4 Rabbit Brown
And I want to get all indices which match these tuples: [('Cat', 'Black'), ('Dog', 'Brown')]
.我想获得与这些元组匹配的所有索引:
[('Cat', 'Black'), ('Dog', 'Brown')]
。 So that would be [1,3]
in this case.所以在这种情况下就是
[1,3]
。
I can't do something like df[np.isin(df['Animal'], ['Cat', 'Dog']) & np.isin(df['Color'], ['Black', 'Brown'])]
because that would give me [1,2,3]
我不能做这样的事情
df[np.isin(df['Animal'], ['Cat', 'Dog']) & np.isin(df['Color'], ['Black', 'Brown'])]
因为那会给我[1,2,3]
If this were just one column I would use df[np.isin(df[col], ls)]
.如果这只是一列,我会使用
df[np.isin(df[col], ls)]
。
If I only cared about one tuple I could have done df[(df[col0] == tup[0]) & (df[col1] == tup[1])]
如果我只关心一个元组,我可以做
df[(df[col0] == tup[0]) & (df[col1] == tup[1])]
I just don't know how to combine the two concepts.我只是不知道如何将这两个概念结合起来。
Here is a way with the pandas MultiIndex.这是使用 Pandas MultiIndex 的一种方法。 I changed the example to have a red dog:
我将示例更改为有一只红狗:
from io import StringIO
import pandas as pd
data = ''' Animal Color
0 Dog White
1 Cat Black
2 Dog Red
3 Dog Brown
4 Rabbit Brown
'''
df = pd.read_csv(StringIO(data), sep='\s+', engine='python', index_col=0)
to_keep = [('Cat', 'Black'),
('Dog', 'Red'),
]
mask = pd.MultiIndex.from_frame(df[['Animal', 'Color']]).isin(to_keep)
print(df.loc[mask])
Animal Color
1 Cat Black
2 Dog Red
Let's try broadcasting:让我们尝试广播:
mask = (df.values[:,None,:] == np.array(a)).all(-1).any(-1)
df[mask]
Output:输出:
Animal Color
1 Cat Black
3 Dog Brown
You can just create a Boolean series that will contain your logic as follows:您可以创建一个包含逻辑的布尔系列,如下所示:
criterion = [('Cat', 'Black'), ('Dog', 'Brown')]
cond = reduce(lambda x, y: ((df['Animal'] == x[0]) & (df['Color'] == x[1])) | ((df['Animal'] == y[0]) & (df['Color'] == y[1])), criterion)
print(df[cond])
Outputs:输出:
Animal Color
1 Cat Black
3 Dog Brown
You could use a for loop
to pull the indices :您可以使用
for loop
来拉索引:
df.loc[[ind
for ind, a, b in zip(df.index, df.Animal, df.Color)
if (a, b) in keep]]
Animal Color
1 Cat Black
3 Dog Brown
If the indices are not important you could set index
and reset index
:如果索引不重要,您可以
set index
并reset index
:
df.set_index(["Animal", "Color"], append=False, drop=False).loc[keep].reset_index(
drop=True
)
Animal Color
0 Cat Black
1 Dog Brown
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.