I have a data frame that has a delimited string column that has to be compared with a list. If the result of the elements in the delimited string and elements of the list intersect, consider that row.
For example
test_lst = [20, 45, 35]
data = pd.DataFrame({'colA': [1, 2, 3],
'colB': ['20,45,50,60', '22,70,35', '10,90,100']})
should have the output as because the elements 20,45 are common in both the list variable and delimited text in DF in the first row.
Likewise, 35 intersects in row 2
colA | colB |
---|---|
1 | 20,45,50,60 |
2 | 22,70,35 |
What I have tried is
test_lst = [20, 45, 35]
data["colC"]= data['colB'].str.split(',')
data
# data["colC"].apply(lambda x: set(x).intersection(test_lst))
print(data[data['colC'].apply(lambda x: set(x).intersection(test_lst)).astype(bool)])
data
Does not give the required result.
Any help is appreciated
This might not be the best approach, but it works.
import pandas as pd
df = pd.DataFrame({'colA': [1, 2, 3],
'colB': ['20,45,50,60', '22,70,35', '10,90,100']})
def match_element(row):
row_elements = [int(n) for n in row.split(',')]
test_lst = [20, 45, 35]
if [value for value in row_elements if value in test_lst]:
return True
else:
return False
mask = df['colB'].apply(lambda row: match_element(row))
df = df[mask]
output:
colA | colB | |
---|---|---|
0 | 1 | 20,45,50,60 |
1 | 2 | 22,70,35 |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.