简体   繁体   English

在Python列表中查找相似项目的有效方法

[英]Efficient way to find similar items in a list in Python

I have a list of list as follows: 我有一个列表列表,如下所示:

list_1 = [[[1,a],[2,b]], [[3,c],[4,d]], [[1,a],[5,d]], [[8,r],[10,u]]]

I am trying to find whether an element is this list is similar to another element. 我试图找到一个元素是否是此列表类似于另一个元素。 Right now, I'm looping it twice ie for each element, check against the rest. 现在,我将其循环两次,即对于每个元素,检查其余元素。 My output is: 我的输出是:

[[[1,a],[2,b]], [[1,a],[5,d]]]

Is there a way to do this more efficiently? 有办法更有效地做到这一点吗?

Thanks. 谢谢。

You can use itertools.combinations and any functions like this 您可以使用itertools.combinations和类似的任何功能

from itertools import combinations
for item in combinations(list_1, 2):
    if any(i in item[1] for i in item[0]):
        print item

Output 输出量

([[1, 'a'], [2, 'b']], [[1, 'a'], [5, 'd']])

I'm assuming that, by similar, you mean that the element has at least one matching pair within it. 我假设类似地,您的意思是该元素中至少有一对匹配的对。 In this case, rather than do a nested loop, you could map each element into a dict of lists twice (once for each [number,str] pair within it). 在这种情况下,您可以将每个元素映射到列表的字典两次(而不是嵌套循环)(对其中的每个[number,str]对一次)。 When you finish, each key in the dict will map to the list of elements which contain that key (ie, are similar). 完成后,字典中的每个键都将映射到包含该键(即相似)的元素列表。

Example code: 示例代码:

list_1 = [[[1,'a'],[2,'b']], [[3,'c'],[4,'d']], [[1,'a'],[5,'d']], [[8,'r'],[10,'u']]]

d = {}

for elt in list_1:
    s0 = '%d%s' % (elt[0][0], elt[0][1])
    if s0 in d:
        d[s0].append(elt)
    else:
        d[s0] = [elt]

    s1 = '%d%s' % (elt[1][0], elt[1][1])
    if s1 in d:
        d[s1].append(elt)
    else:
        d[s1] = [elt]

for key in d.keys():
    print key, ':', d[key]

Example output: 输出示例:

1a : [[[1, 'a'], [2, 'b']], [[1, 'a'], [5, 'd']]]
8r : [[[8, 'r'], [10, 'u']]]
2b : [[[1, 'a'], [2, 'b']]]
3c : [[[3, 'c'], [4, 'd']]]
5d : [[[1, 'a'], [5, 'd']]]
4d : [[[3, 'c'], [4, 'd']]]
10u : [[[8, 'r'], [10, 'u']]]

Any of the dict entries with length > 1 have similar elements. 长度大于1的所有dict条目都具有相似的元素。 This will reduce the runtime complexity of your code to O(n), assuming you have a way to obtain a string representation of a, b, c, etc. 假设您有办法获取a,b,c等的字符串表示形式,这会将代码的运行时复杂度降低到O(n)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM