如何比较 pandas 数据框列中分隔字符串中的每个元素与 python 列表 object 元素

Question

I have a data frame that has a delimited string column that has to be compared with a list.我有一个数据框，它有一个分隔字符串列，必须与列表进行比较。 If the result of the elements in the delimited string and elements of the list intersect, consider that row.如果分隔字符串中元素的结果与列表元素相交，则考虑该行。

For example例如

test_lst = [20, 45, 35]
data = pd.DataFrame({'colA': [1, 2, 3],
          'colB': ['20,45,50,60', '22,70,35', '10,90,100']})

should have the output as because the elements 20,45 are common in both the list variable and delimited text in DF in the first row.应该有 output 因为元素 20,45 在第一行的 DF 中的列表变量和分隔文本中都很常见。

Likewise, 35 intersects in row 2同样，第 2 行有 35 个相交

colA可乐	colB colB
1 1个	20,45,50,60 20,45,50,60
2 2个	22,70,35 22,70,35

What I have tried is我试过的是

test_lst = [20, 45, 35]
data["colC"]= data['colB'].str.split(',')
data

# data["colC"].apply(lambda x: set(x).intersection(test_lst))
print(data[data['colC'].apply(lambda x: set(x).intersection(test_lst)).astype(bool)])
data

Does not give the required result.没有给出所需的结果。

Any help is appreciated任何帮助表示赞赏

Answer 1

This might not be the best approach, but it works.这可能不是最好的方法，但它确实有效。

import pandas as pd

df = pd.DataFrame({'colA': [1, 2, 3],
          'colB': ['20,45,50,60', '22,70,35', '10,90,100']}) 

def match_element(row):
    row_elements = [int(n) for n in row.split(',')]
    test_lst = [20, 45, 35]
    
    if [value for value in row_elements if value in test_lst]:
        return True
    else:
        return False

mask = df['colB'].apply(lambda row: match_element(row))
df = df[mask]

output: output：

	colA可乐	colB colB
0 0	1 1个	20,45,50,60 20,45,50,60
1 1个	2 2个	22,70,35 22,70,35

如何比较 pandas 数据框列中分隔字符串中的每个元素与 python 列表 object 元素

问题描述

1 个解决方案

解决方案1
0 2022-02-18 06:55:16

如何比较 pandas 数据框列中分隔字符串中的每个元素与 python 列表 object 元素

问题描述

1 个解决方案

解决方案1 0 2022-02-18 06:55:16

解决方案1
0 2022-02-18 06:55:16