简体   繁体   中英

How to compare the each elements in the delimited string in pandas data frame column with a python list object elements

I have a data frame that has a delimited string column that has to be compared with a list. If the result of the elements in the delimited string and elements of the list intersect, consider that row.

For example

test_lst = [20, 45, 35]
data = pd.DataFrame({'colA': [1, 2, 3],
          'colB': ['20,45,50,60', '22,70,35', '10,90,100']}) 

should have the output as because the elements 20,45 are common in both the list variable and delimited text in DF in the first row.

Likewise, 35 intersects in row 2

colA colB
1 20,45,50,60
2 22,70,35

What I have tried is

test_lst = [20, 45, 35]
data["colC"]= data['colB'].str.split(',')
data

# data["colC"].apply(lambda x: set(x).intersection(test_lst))
print(data[data['colC'].apply(lambda x: set(x).intersection(test_lst)).astype(bool)])
data

Does not give the required result.

Any help is appreciated

This might not be the best approach, but it works.

import pandas as pd

df = pd.DataFrame({'colA': [1, 2, 3],
          'colB': ['20,45,50,60', '22,70,35', '10,90,100']}) 

def match_element(row):
    row_elements = [int(n) for n in row.split(',')]
    test_lst = [20, 45, 35]
    
    if [value for value in row_elements if value in test_lst]:
        return True
    else:
        return False

mask = df['colB'].apply(lambda row: match_element(row))
df = df[mask]

output:

colA colB
0 1 20,45,50,60
1 2 22,70,35

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM