How to identify partial tuple matches in a list of tuples

Question

If I have a dictionary consisting of a list of tuples like so:

d = {'ENSG00000186092': [('ENST00000641515', '3'),
                        ('ENST00000641515', '1'),
                        ('ENST00000641515', '2'),
                        ('ENST00000335137', '1')],
    'ENSG00000284662': [('ENST00000332831', '1')],
    'ENSG00000284733': [('ENST00000426406', '1')]}

How can I identify if, for each key, there are tuples whereby the first element of the tuples don't match but the second elements do?

For instance, in the above example, we would only see one "hit", and that'd be for the key ENSG00000186092 because of:

('ENST00000641515', '1')
('ENST00000335137', '1')

Answer 1

Would a convoluted list-comprehension be of interest to you?

[
  k for k, v in d.items()
  if any(
    (i, j)
    for i, j in v
    for x, y in v
    if i != x and j == y
  )
]
>>> ['ENSG00000186092']

Loops over the dictionary
Loops over the list of tuples for each key
For each tuple, loops through the same list and checks to see that the first entries don't match but the second do
If there are any found then record that key from step 2.

Answer 2

Being very verbose:

d = {
    "ENSG00000186092": [
        ("ENST00000641515", "3"),
        ("ENST00000641515", "1"),
        ("ENST00000641515", "2"),
        ("ENST00000335137", "1"),
    ],
    "ENSG00000284662": [("ENST00000332831", "1")],
    "ENSG00000284733": [("ENST00000426406", "1")],
}

def has_duplicates(list_of_tuples):
    seen = set()
    for _, value in list_of_tuples:
        if value in seen:
            return True
        seen.add(value)
    return False

dupes = [key for key, value in d.items() if has_duplicates(value)]

print(dupes)

The has_duplicates function takes a value from your dictionary. It returns True if any of the tuples in that value have the same second value.

The list comprehension at the return gives you all the keys were has_duplicates is True.

Answer 3

d = {'ENSG00000186092': [('ENST00000641515', '3'),
                        ('ENST00000641515', '1'),
                        ('ENST00000641515', '2'),
                        ('ENST00000335137', '1')],
    'ENSG00000284662': [('ENST00000332831', '1')],
    'ENSG00000284733': [('ENST00000426406', '1')]}

for k, a in d.items():
    a_s = sorted(a, key=lambda x: (x[1], x[0]))
    for i in range(len(a_s)-1):
        if a_s[i][1] == a_s[i+1][1] and a_s[i][0] != a_s[i+1][0]:
            print (k, a_s[i], a_s[i+1])

Sort the tuples by second element then by first element of the tuples so that the tuples come together by second element and then by first element
Check the condition between current tuple and next tuple in the sorted list and print them if condition is met
Time Complexity if d has k items and list size is 'n' then it is O(k*nlogn) [ k for outer loop and nlogn for sorting]

How to identify partial tuple matches in a list of tuples

Question

3 answers

solution1
1 2019-03-27 23:14:38

solution2
0 2019-03-27 23:22:28

solution3
0 2019-03-27 23:28:19

How to identify partial tuple matches in a list of tuples

Question

3 answers

solution1 1 2019-03-27 23:14:38

solution2 0 2019-03-27 23:22:28

solution3 0 2019-03-27 23:28:19

solution1
1 2019-03-27 23:14:38

solution2
0 2019-03-27 23:22:28

solution3
0 2019-03-27 23:28:19