简体   繁体   中英

comparing elements of tuples in a list

I am trying to make a code that compares the second element of each tuple and extract the tuples that contain duplicates of the second element.

For example, if I have

List = [(0, 2), (1, 0), (2, 1), (3, 2)]

duplicate_tuples = [(0, 2), (3, 2)]  # desired output

I just cannot figure out how to designate the second element in my for iteration

for i in List: # would iterate each tuple
    if i[1] of i in List is duplicate...

Lack of pythonic grammar is frustrating. How should I approach this problem?

You can collect your tuples in a collections.defaultdict() , then report the lists that have more than one duplicate:

from collections import defaultdict

lst = [(0, 2), (1, 0), (2, 1), (3, 2), (2, 0)]

dups = defaultdict(list)
for fst, snd in lst:
   dups[snd].append((fst, snd))

print([v for k, v in dups.items() if len(v) > 1])
# [[(0, 2), (3, 2)], [(1, 0), (2, 0)]]

Or keep the duplicates in a dictionary for easy lookups:

print({k: v for k, v in dups.items() if len(v) > 1})
# {2: [(0, 2), (3, 2)], 0: [(1, 0), (2, 0)]}

Working in numpy arrays would be efficient instead of list/tuples.

import numpy as np
a = np.array([(0, 2), (1, 0), (2, 1), (3, 2),(3,0)])

unique_vals,inverse_indices,counts=np.unique(a[:,1],return_inverse=True,return_counts=True)

Based on the unique function output, we can generate the duplicates list

duplicates=[(i,a[inverse_indices==i]) for i  in unique_vals[np.where(counts>1)[0]]]

Output:

[(0, array([[1, 0],[3, 0]])),
 (2, array([[0, 2],[3, 2]]))]

There is a chance for more duplicates, So groupby is a better option.

In [6]: from itertools import groupby
In [7]: for g,l in groupby(sorted(lst,key=lambda x:x[1]),key=lambda x:x[1]):
   ...:     temp = list(l)
   ...:     if len(temp) > 1:
   ...:         print g,temp
   ...:   
2 [(0, 2), (3, 2)]

Here is another approach, using numpy:

duplicate_list = []

foo = np.array([(0,2), (1,0), (2,1), (3,2), (3,0), (1,2)])

for i in range(len(np.unique(foo[:,1]))):
    if np.sum(foo[:,1] == i) > 1:
        duplicate_list.append(foo[foo[:,1] == i].tolist())

print(duplicate_list)

Output:

[[[1, 0], [3, 0]], [[0, 2], [3, 2], [1, 2]]]

With np.unique(foo[:,1]) we get the unique elements of the second element in a tuple, and then we append it to a list if the count is greater than 1 or duplicate is present, which returns 2 lists as we have 2 occurrences (0 and 2). If you have a specific number say (2) then we can avoid the loop.

Eg

bla = np.array([(0, 2), (1, 0), (2, 1), (3, 2)])
duplicate = []

if np.sum(bla[:,1] == 2) > 1:
    duplicate = bla[bla[:,1] == 2].tolist()

print(duplicate)

Output:

[[0, 2], [3, 2]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM