简体   繁体   English

如何在子列表中的字符串之间找到完全匹配

[英]How to find exact matches between strings in sublists

I have two lists composed of sublists, something like:我有两个由子列表组成的列表,例如:

 conopt = [["element1","element2"],["bla"]] 
 mat = [["element1","elementA"],["bla & etc"]] 

From that data I want to fill a matrix with dimensions len(conopt) x len(mat).从该数据中,我想填充尺寸为 len(conopt) x len(mat) 的矩阵。 Since python by default doesn't have a matrix I'll use another list with sublists:由于默认情况下 python 没有矩阵,我将使用另一个带有子列表的列表:

finalmat = [ ["-","X",...,"X"],[...]]

I want finalmat to have an "X" where there is a full match (when a subelement of conopt matches a subelement of mat) and a "-" where there isn't.我希望finalmat在完全匹配的地方有一个“X”(当 conopt 的子元素与 mat 的子元素匹配时)和一个“-”,在那里没有。 However I only care about the first full match for each sublist.但是我只关心每个子列表的第一个完整匹配。 If a sublist of conopt has 1 or more matches on a sublist of mat, the result should be the same, only one "X".如果 conopt 的子列表在 mat 的子列表上有 1 个或多个匹配项,则结果应该相同,只有一个“X”。

I've tried the following:我尝试了以下方法:

for i in mat:
    for j in conopt:
         for item in j:
             if item in i:
                 finalmat[mat.index(i)][conopt.index(j)] = "X"

However the result is not correct, because I've manually checked some data points and it doesn't give the correct result.但是结果不正确,因为我手动检查了一些数据点,但没有给出正确的结果。


some extra (less important) info:一些额外的(不太重要的)信息:

  • the string elements are composed of letters, numbers, spaces (only spaces between words) and "#&" characters.字符串元素由字母、数字、空格(仅单词之间的空格)和“#&”字符组成。

  • the sublists have an arbitrary number of strings.子列表具有任意数量的字符串。

  • this data comes from an excel file.这个数据来自一个excel文件。 I've manually extracted and modified it to match the python syntax.我已经手动提取并修改了它以匹配 python 语法。

  • the output (finalmat) is going back to excel.输出(finalmat)将返回到excel。 I'm doing this step manually because this is an one off task and I don't want to complicate my code even more.我正在手动执行此步骤,因为这是一项一次性任务,我不想让我的代码更加复杂。

mat.index(i) and conopt.index(j) will not find i and j in sublists . mat.index(i)conopt.index(j)不会在sublists 中找到ij I suggest using enumerate .我建议使用enumerate Also make sure you correctly initialize finalmat .还要确保正确初始化finalmat

finalmat = [ [ "-" for j in range(len(conopt) ] for i in range(len(mat)) ]
for indexmat, itemmat in enumerate(mat):
    for indexconopt, itemconopt in enumerate(conopt):
         for item in itemconopt:
             if item in itemmat:
                 finalmat[indexmat][indexconopt] = "X"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM