[英]Print the best match in the Python list, where each element is separated internally
我创建基于在一个文件中,即元件,当元件Python列表row[0]
存在于row[3]
附加两排列出'matches'
,反之亦然,当元素row[3]
是在row[0]
,将它们附加到'matches'
。 列表如下所示
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
我只想为每个元素或完全匹配打印第一个输出,而不管以下情况如何:
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']
在这里,如果您注意到,列表的每个元素都由";"
分隔";"
。 我试图以此为标准进行比较。 我只希望每个元素的第一个出现是基于";"
之后的一个或多个单词 或者,当双方的单词都相同时。 例如,对于外周血单个核细胞,它选择了第一次出现,而对于白种人,则选择了第二个出现,因为它是完美的匹配。 我真的很感谢在投票之前的任何帮助。
您需要跟踪看到的所有完整字符串和拆分的子字符串,只添加我们未看到的内容即可:
l=['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
seen = set()
res = []
for ele in l:
a,b = ele.split(";",1)
# make sure we don't have not seen the full string nor left/right hand substring
# or we find exact matches both sides and we don't already have that perfect match added
if ele.lower() not in seen and not any(x.lower() in seen for x in (a,b)) or a == b and ele not in seen:
res.append(ele)
# keep track of all full strings and left/right substrings
seen.update([a.lower(),b.lower(),ele.lower()])
print(res)
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.