简体   繁体   English

如何比较 pandas 数据框列中分隔字符串中的每个元素与 python 列表 object 元素

[英]How to compare the each elements in the delimited string in pandas data frame column with a python list object elements

I have a data frame that has a delimited string column that has to be compared with a list.我有一个数据框,它有一个分隔字符串列,必须与列表进行比较。 If the result of the elements in the delimited string and elements of the list intersect, consider that row.如果分隔字符串中元素的结果与列表元素相交,则考虑该行。

For example例如

test_lst = [20, 45, 35]
data = pd.DataFrame({'colA': [1, 2, 3],
          'colB': ['20,45,50,60', '22,70,35', '10,90,100']}) 

should have the output as because the elements 20,45 are common in both the list variable and delimited text in DF in the first row.应该有 output 因为元素 20,45 在第一行的 DF 中的列表变量和分隔文本中都很常见。

Likewise, 35 intersects in row 2同样,第 2 行有 35 个相交

colA可乐 colB colB
1 1个 20,45,50,60 20,45,50,60
2 2个 22,70,35 22,70,35

What I have tried is我试过的是

test_lst = [20, 45, 35]
data["colC"]= data['colB'].str.split(',')
data

# data["colC"].apply(lambda x: set(x).intersection(test_lst))
print(data[data['colC'].apply(lambda x: set(x).intersection(test_lst)).astype(bool)])
data

Does not give the required result.没有给出所需的结果。

Any help is appreciated任何帮助表示赞赏

This might not be the best approach, but it works.这可能不是最好的方法,但它确实有效。

import pandas as pd

df = pd.DataFrame({'colA': [1, 2, 3],
          'colB': ['20,45,50,60', '22,70,35', '10,90,100']}) 

def match_element(row):
    row_elements = [int(n) for n in row.split(',')]
    test_lst = [20, 45, 35]
    
    if [value for value in row_elements if value in test_lst]:
        return True
    else:
        return False

mask = df['colB'].apply(lambda row: match_element(row))
df = df[mask]

output: output:

colA可乐 colB colB
0 0 1 1个 20,45,50,60 20,45,50,60
1 1个 2 2个 22,70,35 22,70,35

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM