简体   繁体   中英

remove redundancies in a list with tuple of three elements

I have a list of tuples similar to A:

 A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)], 
[(160, 2, 5), (1000, 2, 5), (111, 1, 2)], 
[(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)], 
[(128, 3, 4)], 
[(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]

In each row of this list, there might be tuples which their second and third elements are the same. For example in A[0]:

A[0] = [(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)]

(90, 1, 5), (1000, 1, 5) and (176, 1, 5) have the same second and third elements. Among these, I need to keep the one which has the max value for the first element and remove the other two. So, I should be able to keep (1000, 1, 5) and remove (90, 1, 5) and (176, 1, 5) from A[0].

It would be better to keep the ordering of the list.

Is there any way to do that iteratively for all the rows in A? Any help would be appreciated!

If I understand correctly, here's an itertools.groupby solution. I'm assuming order in the final result does not matter.

from itertools import groupby

def keep_max(lst, groupkey, maxkey):
    'groups lst w.r.t. to groupkey, keeps maximum of each group w.r.t. maxkey'
    sor = sorted(lst, key=groupkey)
    groups = (tuple(g) for _, g in groupby(sor, key=groupkey))
    return [max(g, key=maxkey) for g in groups]

In action:

>>> from operator import itemgetter
>>> groupkey = itemgetter(1, 2)
>>> maxkey = itemgetter(0)
>>> A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)], [(160, 2, 5), (1000, 2, 5), (111, 1, 2)], [(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)], [(128, 3, 4)], [(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]
>>>
>>> [keep_max(sub, groupkey, maxkey) for sub in A]
[[(111, 1, 2), (139, 1, 3), (1000, 1, 5)],
 [(111, 1, 2), (1000, 2, 5)],
 [(139, 1, 3), (128, 3, 4), (134, 3, 5)],
 [(128, 3, 4)],
 [(1000, 1, 5), (1000, 2, 5), (134, 3, 5)]]

This solution keeps the original ordering of the tuples assuming each tuple (as a whole) is unique; in the case there are duplicates tuples this will return the last appearance of each tuple:

from operator import itemgetter

A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)],
     [(160, 2, 5), (1000, 2, 5), (111, 1, 2)],
     [(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)],
     [(128, 3, 4)],
     [(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]


def uniques(lst):
    groups = {}

    for t in lst:
        groups.setdefault(t[1:], []).append(t)

    lookup = {t: i for i, t in enumerate(lst)}
    index = lookup.get

    first = itemgetter(0)
    return sorted(map(lambda x: max(x, key=first), groups.values()), key=index)


result = [uniques(a) for a in A]
print(result)    

Output

[[(139, 1, 3), (1000, 1, 5), (111, 1, 2)], [(1000, 2, 5), (111, 1, 2)], [(134, 3, 5), (128, 3, 4), (139, 1, 3)], [(128, 3, 4)], [(134, 3, 5), (1000, 2, 5), (1000, 1, 5)]]

Using dictionaries:

fin = []
for row in A:
    dict = {}
    for tup in row:
        dict[tup[1:2]] = tup[0]
    fin.append(dict)
A = [[value, t1, t1] for (t1, t2), value in dict.iteritems()]

Using this, your dict will transform A[0] from

A[0] = [(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)]

to

{ (1,5): 1000, (1,3): 139, (1,2): 111 } (as a dict)

and can then be converted back to an array using iteritems

This way, the order will also be preserved.

If you can afford to ignore ordering, you can use itertools.groupby to group elements by the 2nd and 3rd elements on a list sorted by ascending order of 2nd and 3rd elements and descending order of the first element. Then the first element of each group is your desired choice:

from itertools import groupby

A = [[(90, 1, 5), (126, 1, 3), (139, 1, 3), (1000, 1, 5), (111, 1, 2), (176, 1, 5)], 
     [(160, 2, 5), (1000, 2, 5), (111, 1, 2)], 
     [(134, 3, 5), (126, 1, 3), (128, 3, 4), (139, 1, 3)], 
     [(128, 3, 4)], 
     [(90, 1, 5), (160, 2, 5), (134, 3, 5), (1000, 2, 5), (1000, 1, 5), (176, 1, 5)]]

def max_duplicate(lst):
    res = []
    for k, g in groupby(sorted(lst, key=lambda x: (x[1], x[2], -x[0])), key=lambda x: (x[1], x[2])):
        res.append(next(g))
    return res

result = [max_duplicate(l) for l in A]
for r in result:
    print(r)

Output

[(111, 1, 2), (139, 1, 3), (1000, 1, 5)]
[(111, 1, 2), (1000, 2, 5)]
[(139, 1, 3), (128, 3, 4), (134, 3, 5)]
[(128, 3, 4)]
[(1000, 1, 5), (1000, 2, 5), (134, 3, 5)]

You can do this by using a hashmap as follows:

d = {}
for a in A:
    for aa in a:
        v, k1, k2 = aa
        if (k1, k2) in d:
            d[(k1, k2)] = max(v, d[(k1, k2)])
        else:
            d[(k1, k2)] = v

l = [[v, k1, k2] for (k1, k2), v in d.iteritems()]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM