繁体   English   中英

在Python列表中打印最匹配的列表,其中每个元素都在内部分开

[英]Print the best match in the Python list, where each element is separated internally

我创建基于在一个文件中,即元件,当元件Python列表row[0]存在于row[3]附加两排列出'matches' ,反之亦然,当元素row[3]是在row[0] ,将它们附加到'matches' 列表如下所示

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']

我只想为每个元素或完全匹配打印第一个输出,而不管以下情况如何:

['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']

在这里,如果您注意到,列表的每个元素都由";"分隔";" 我试图以此为标准进行比较。 我只希望每个元素的第一个出现是基于";"之后的一个或多个单词 或者,当双方的单词都相同时。 例如,对于外周血单个核细胞,它选择了第一次出现,而对于白种人,则选择了第二个出现,因为它是完美的匹配。 我真的很感谢在投票之前的任何帮助。

您需要跟踪看到的所有完整字符串和拆分的子字符串,只添加我们未看到的内容即可:

l=['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
seen = set()
res = []
for ele in l:
    a,b = ele.split(";",1)
    # make sure we don't have not seen the full string nor left/right hand substring
    # or we find exact matches both sides and we don't already have that perfect match added
    if ele.lower() not in seen and not any(x.lower() in seen for x in (a,b)) or a == b and ele not in seen:
        res.append(ele)
    # keep track of all full strings and left/right substrings 
    seen.update([a.lower(),b.lower(),ele.lower()])
print(res)
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM