[英]How to compare the each elements in the delimited string in pandas data frame column with a python list object elements
I have a data frame that has a delimited string column that has to be compared with a list.我有一个数据框,它有一个分隔字符串列,必须与列表进行比较。 If the result of the elements in the delimited string and elements of the list intersect, consider that row.
如果分隔字符串中元素的结果与列表元素相交,则考虑该行。
For example例如
test_lst = [20, 45, 35]
data = pd.DataFrame({'colA': [1, 2, 3],
'colB': ['20,45,50,60', '22,70,35', '10,90,100']})
should have the output as because the elements 20,45 are common in both the list variable and delimited text in DF in the first row.应该有 output 因为元素 20,45 在第一行的 DF 中的列表变量和分隔文本中都很常见。
Likewise, 35 intersects in row 2同样,第 2 行有 35 个相交
colA![]() |
colB ![]() |
---|---|
1 ![]() |
20,45,50,60 ![]() |
2 ![]() |
22,70,35 ![]() |
What I have tried is我试过的是
test_lst = [20, 45, 35]
data["colC"]= data['colB'].str.split(',')
data
# data["colC"].apply(lambda x: set(x).intersection(test_lst))
print(data[data['colC'].apply(lambda x: set(x).intersection(test_lst)).astype(bool)])
data
Does not give the required result.没有给出所需的结果。
Any help is appreciated任何帮助表示赞赏
This might not be the best approach, but it works.这可能不是最好的方法,但它确实有效。
import pandas as pd
df = pd.DataFrame({'colA': [1, 2, 3],
'colB': ['20,45,50,60', '22,70,35', '10,90,100']})
def match_element(row):
row_elements = [int(n) for n in row.split(',')]
test_lst = [20, 45, 35]
if [value for value in row_elements if value in test_lst]:
return True
else:
return False
mask = df['colB'].apply(lambda row: match_element(row))
df = df[mask]
output: output:
colA![]() |
colB ![]() |
|
---|---|---|
0 ![]() |
1 ![]() |
20,45,50,60 ![]() |
1 ![]() |
2 ![]() |
22,70,35 ![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.