简体   繁体   中英

Find duplicates in a list of lists with tuples

I am trying to find duplicates within tuples that are nested within a list. This whole construction is a list too. If there are other better ways to organize this to let my problem to be solved - I'd be glad to know, because this is something I build on the way.

pairsList = [
                [1, (11, 12), (13, 14)], #list1
                [2, (21, 22), (23, 24)], #list2
                [3, (31, 32), (13, 14)], #list3
                [4, (43, 44), (21, 22)], #list4
               ]

The first element in each list uniquely identifies each list.

From this object pairsList , I want to find out which lists have identical tuples. So I want to report that list1 has the same tuple as list3 (because both have (13,14) . Likewise, list2 and list4 have the same tuple (both have (21,22) ) and need to be reported. The position of tuples within the list doesn't matter ( list2 and list4 both have (13,14) even though the position in the list the tuple has is different).

The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4] . It is the pairs I am interested in.

I am aware of sets and have used them to delete duplicates within the lists in other situations, but cannot understand how to solve this problem. I can check like this if one list contains any element from the other list:

list1 = [1, (11, 12), (13, 14)]
list2 = [3, (31, 32), (13, 14)]
print not set(list1).isdisjoint(list2)
>>>True

So, the code below lets me know what lists have same tuple(s) as the first one. But what is the correct way to perform this on all the lists?

counter = 0
for pair in pairsList:
    list0 = pairsList[0]
    iterList = pairsList[counter]
    if not set(list0).isdisjoint(iterList):
        print iterList[0] #print list ID
    counter += 1

The first element in each list uniquely identifies each list.

Great, then let's convert it to a dict first:

d = {x[0]: x[1:] for x in pairsList}

# d: 
{1: [(11, 12), (13, 14)],
 2: [(21, 22), (23, 24)],
 3: [(31, 32), (13, 14)],
 4: [(43, 44), (21, 22)]}

Let's index the whole data structure:

index = {}
for k, vv in d.iteritems():
    for v in vv:
        index.setdefault(v, []).append(k)

Now index is:

{(11, 12): [1],
 (13, 14): [1, 3],
 (21, 22): [2, 4],
 (23, 24): [2],
 (31, 32): [3],
 (43, 44): [4]}

The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.

pairs = [v for v in index.itervalues() if len(v) == 2]

returns [[1,3],[2,4]] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM