简体   繁体   English

从Pandas的第三个拆分列中查找2个列之一包含任何值的行

[英]Find rows where one of 2 cols contains any value from a third split column in Pandas

Given a Pandas Dataframe like this: 给定这样的熊猫数据框:

A           B           C
-------------------------
A. b.       b. d.       a
c.          c.          d
a. k.       b.          b
a.          b.          a, B

Code: 码:

df = pd.DataFrame({
    'A': ['A. b.', 'c.', 'a. k.', 'a.'],
    'B': ['b. d.', 'c.', 'b.', 'b.'],
    'C': ['a', 'd', 'b', 'a, B']
})

I want to select all rows where A or B contain any value from C. Here, the result would be: 我想选择A或B包含C中任何值的所有行。在这里,结果将是:

A           B           C
-------------------------
A. b.       b. d.       a
a. k.       b.          b
a.          b.          a, B

All cells contain values in a simple delimited format (can use split ). 所有单元格都包含简单的定界格式的值(可以使用split )。

I've tried: 我试过了:

df[df['A'].str.contains(df['C'].split(','))]

But no success so far. 但是到目前为止没有成功。

Assuming from your sample output that your comparisons are case-insensitive: 从示例输出中假设您的比较不区分大小写:

mask = pd.DataFrame({
           'AB': (df.A + df.B).str.lower().map(set),
           'C': df.C.str.split(',').map(set)
       }).apply(lambda row: bool(row['AB'].intersection(row['C'])), axis=1)

df[mask].reset_index(drop=True)

Description: 描述:

Combine columns A and B and convert each cell value to a set of the characters it contains. 合并列AB ,并将每个单元格值转换为它包含的一组字符。 Split column C delimited values to a set as well and check if the intersection between AB and C is empty or not, use the resulting bool series as a mask for your original dataframe. 将列C定界值也拆分为一组,并检查ABC之间的交集是否为空,请使用生成的布尔系列作为原始数据帧的掩码。


Timings: 时序:

def f():
    mask = pd.DataFrame({
               'AB': (df.A + ' ' + df.B).str.lower().map(set),
               'C': df.C.str.split(',').map(set)
           }).apply(lambda row: bool(row['AB'].intersection(row['C'])), axis=1)

    df[mask].reset_index(drop=True)

%timeit f

Output: 输出:

27.4 ns ± 0.483 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

I want to select all rows where A or B contain any value from C 我想选择A或B包含C中任何值的所有行

You can use .isin() here. 您可以在此处使用.isin() Here is a small example: 这是一个小例子:

df1 = pd.DataFrame([[1,4,1], [2,5,4], [3,6,7]], columns=['a','b','c'])
df1.loc[df1['a'].isin(df1['c'])]

#output:

    a   b   c
0   1   4   1

The reason why this returns the row is does is because the value 1 is in column c . 之所以返回该行,是因为值1在列c

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:删除任何列包含某个子字符串的所有行 - Pandas: Remove all rows where any of the column contains a certain substring Pandas 在任何列中查找具有值的行 - Pandas find rows with value in any column 如何从熊猫数据框中删除行,其中任何列都包含我不想要的符号 - How to drop rows from a pandas dataframe where any column contains a symbol I don't want pandas dataframe函数返回日期最近的行,并且其中一列包含输入值,抛出错误 - pandas dataframe function to return rows where date is most recent and one of the column contains the input value, throwing error 将pandas列拆分为多行,其中拆分为另一列的值 - Split pandas column into multiple rows, where splitting is on the value of another column 从 Pandas 数据框中选择多行,其中一列包含一些作为 NaN 的值 - Select multiple rows from pandas data frame where one of column contains some values as NaN 获取其中任何列包含特定值的行的子集 - Get subset of rows where any column contains a particular value 熊猫:删除其中一列的值出现在另一列中的任何行的行 - Pandas: Remove Rows Where the Value of One Column Appears on Any Row in Another 从 Pandas 数据框中选择特定列包含数字的行 - Select rows from Pandas dataframe where a specific column contains numbers Pandas删除列包含*的行 - Pandas drop rows where column contains *
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM