从Pandas的第三个拆分列中查找2个列之一包含任何值的行

Question

Given a Pandas Dataframe like this: 给定这样的熊猫数据框：

A           B           C
-------------------------
A. b.       b. d.       a
c.          c.          d
a. k.       b.          b
a.          b.          a, B

Code: 码：

df = pd.DataFrame({
    'A': ['A. b.', 'c.', 'a. k.', 'a.'],
    'B': ['b. d.', 'c.', 'b.', 'b.'],
    'C': ['a', 'd', 'b', 'a, B']
})

I want to select all rows where A or B contain any value from C. Here, the result would be: 我想选择A或B包含C中任何值的所有行。在这里，结果将是：

A           B           C
-------------------------
A. b.       b. d.       a
a. k.       b.          b
a.          b.          a, B

All cells contain values in a simple delimited format (can use split ). 所有单元格都包含简单的定界格式的值（可以使用split ）。

I've tried: 我试过了：

df[df['A'].str.contains(df['C'].split(','))]

But no success so far. 但是到目前为止没有成功。

Answer 1

Assuming from your sample output that your comparisons are case-insensitive: 从示例输出中假设您的比较不区分大小写：

mask = pd.DataFrame({
           'AB': (df.A + df.B).str.lower().map(set),
           'C': df.C.str.split(',').map(set)
       }).apply(lambda row: bool(row['AB'].intersection(row['C'])), axis=1)

df[mask].reset_index(drop=True)

Description: 描述：

Combine columns A and B and convert each cell value to a set of the characters it contains. 合并列A和B ，并将每个单元格值转换为它包含的一组字符。 Split column C delimited values to a set as well and check if the intersection between AB and C is empty or not, use the resulting bool series as a mask for your original dataframe. 将列C定界值也拆分为一组，并检查AB与C之间的交集是否为空，请使用生成的布尔系列作为原始数据帧的掩码。

Timings: 时序：

def f():
    mask = pd.DataFrame({
               'AB': (df.A + ' ' + df.B).str.lower().map(set),
               'C': df.C.str.split(',').map(set)
           }).apply(lambda row: bool(row['AB'].intersection(row['C'])), axis=1)

    df[mask].reset_index(drop=True)

%timeit f

Output: 输出：

27.4 ns ± 0.483 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Answer 2

I want to select all rows where A or B contain any value from C 我想选择A或B包含C中任何值的所有行

You can use .isin() here. 您可以在此处使用.isin() 。 Here is a small example: 这是一个小例子：

df1 = pd.DataFrame([[1,4,1], [2,5,4], [3,6,7]], columns=['a','b','c'])
df1.loc[df1['a'].isin(df1['c'])]

#output:

    a   b   c
0   1   4   1

The reason why this returns the row is does is because the value 1 is in column c . 之所以返回该行，是因为值1在列c 。

从Pandas的第三个拆分列中查找2个列之一包含任何值的行

问题描述

2 个解决方案

解决方案1
1 2017-11-28 17:24:50

解决方案2
0 2017-11-28 17:09:39

从Pandas的第三个拆分列中查找2个列之一包含任何值的行

问题描述

2 个解决方案

解决方案1 1 2017-11-28 17:24:50

解决方案2 0 2017-11-28 17:09:39

解决方案1
1 2017-11-28 17:24:50

解决方案2
0 2017-11-28 17:09:39