简体   繁体   中英

Maintaining the order of the elements in a frozen set

I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:

x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]

The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset on my list as follows:

y = set(map(frozenset, x))

This gives me the following result:

{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}

I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas dataframe. The dataframe will look like this:

 Name  Marks1  Marks2
0    a       1       2
1    b       3       4
2    x       5       6

Instead of operating on the set of frozenset s directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim):

from itertools import filterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

Basically this would solve the issue when you use key=frozenset :

>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]

>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

This returns the elements as-is and it also maintains the relative order between the elements.

No ordering with frozensets. You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set:

y = set()
lst = []
for i in x:
    t = tuple(sorted(i, key=str)
    if t not in y:
         y.add(t)
         lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

The first entry gets preserved.

There are some quite useful functions in NumPy which can help you to solve this problem.

import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'], 
   dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM