简体   繁体   中英

How to identify partial tuple matches in a list of tuples

If I have a dictionary consisting of a list of tuples like so:

d = {'ENSG00000186092': [('ENST00000641515', '3'),
                        ('ENST00000641515', '1'),
                        ('ENST00000641515', '2'),
                        ('ENST00000335137', '1')],
    'ENSG00000284662': [('ENST00000332831', '1')],
    'ENSG00000284733': [('ENST00000426406', '1')]}

How can I identify if, for each key, there are tuples whereby the first element of the tuples don't match but the second elements do?

For instance, in the above example, we would only see one "hit", and that'd be for the key ENSG00000186092 because of:

('ENST00000641515', '1')
('ENST00000335137', '1')

Would a convoluted list-comprehension be of interest to you?

[
  k for k, v in d.items()
  if any(
    (i, j)
    for i, j in v
    for x, y in v
    if i != x and j == y
  )
]
>>> ['ENSG00000186092']
  1. Loops over the dictionary
  2. Loops over the list of tuples for each key
  3. For each tuple, loops through the same list and checks to see that the first entries don't match but the second do
  4. If there are any found then record that key from step 2.

Being very verbose:

d = {
    "ENSG00000186092": [
        ("ENST00000641515", "3"),
        ("ENST00000641515", "1"),
        ("ENST00000641515", "2"),
        ("ENST00000335137", "1"),
    ],
    "ENSG00000284662": [("ENST00000332831", "1")],
    "ENSG00000284733": [("ENST00000426406", "1")],
}

def has_duplicates(list_of_tuples):
    seen = set()
    for _, value in list_of_tuples:
        if value in seen:
            return True
        seen.add(value)
    return False

dupes = [key for key, value in d.items() if has_duplicates(value)]

print(dupes)

The has_duplicates function takes a value from your dictionary. It returns True if any of the tuples in that value have the same second value.

The list comprehension at the return gives you all the keys were has_duplicates is True.

d = {'ENSG00000186092': [('ENST00000641515', '3'),
                        ('ENST00000641515', '1'),
                        ('ENST00000641515', '2'),
                        ('ENST00000335137', '1')],
    'ENSG00000284662': [('ENST00000332831', '1')],
    'ENSG00000284733': [('ENST00000426406', '1')]}

for k, a in d.items():
    a_s = sorted(a, key=lambda x: (x[1], x[0]))
    for i in range(len(a_s)-1):
        if a_s[i][1] == a_s[i+1][1] and a_s[i][0] != a_s[i+1][0]:
            print (k, a_s[i], a_s[i+1])
  • Sort the tuples by second element then by first element of the tuples so that the tuples come together by second element and then by first element
  • Check the condition between current tuple and next tuple in the sorted list and print them if condition is met
  • Time Complexity if d has k items and list size is 'n' then it is O(k*nlogn) [ k for outer loop and nlogn for sorting]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM