简体   繁体   English

维护冻结集中元素的顺序

[英]Maintaining the order of the elements in a frozen set

I have a list of tuples, each tuple of which contains one string and two integers. 我有一个元组列表,每个元组包含一个字符串和两个整数。 The list looks like this: 该列表如下所示:

x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]

The list contains thousands of such tuples. 该列表包含数千个这样的元组。 Now if I want to get unique combinations, I can do the frozenset on my list as follows: 现在,如果我想获得独特的组合,我可以在列表中执行frozenset ,如下所示:

y = set(map(frozenset, x))

This gives me the following result: 这给了我以下结果:

{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}

I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas dataframe. 我知道set是一个无序的数据结构,这是正常情况,但我想在这里保留元素的顺序,以便我可以在此后插入pandas数据帧中的元素。 The dataframe will look like this: 数据框将如下所示:

 Name  Marks1  Marks2
0    a       1       2
1    b       3       4
2    x       5       6

Instead of operating on the set of frozenset s directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim): 您可以直接在frozenset setfrozenset ,而只能将其用作辅助数据结构 - 就像在itertools部分unique_everseen配方中一样(复制逐字):

from itertools import filterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

Basically this would solve the issue when you use key=frozenset : 基本上,当您使用key=frozenset时,这将解决问题:

>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]

>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

This returns the elements as-is and it also maintains the relative order between the elements. 将按原样返回元素并且还保持元素之间的相对顺序。

No ordering with frozensets. 没有与frozensets订购。 You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set: 您可以改为创建已排序的元组以检查项目是否存在,如果集合中不存在元组,则添加原始元组:

y = set()
lst = []
for i in x:
    t = tuple(sorted(i, key=str)
    if t not in y:
         y.add(t)
         lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

The first entry gets preserved. 第一个条目被保留。

There are some quite useful functions in NumPy which can help you to solve this problem. NumPy中有一些非常有用的功能可以帮助您解决这个问题。

import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'], 
   dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM