简体   繁体   English

如何在Python中为列表分配序号的唯一值?

[英]How to assign ordinal numbers of unique values to a list in Python?

Suppose I have a list 假设我有一个清单

A = ['A', 'A', 'A', 'B', 'B', 'C']

How to turn it to 如何把它变成

B = [0, 0, 0, 1, 1, 2]

?

I wrote this way 我是这样写的

C = {t[1]:t[0] for t in enumerate(list(set(A)))}
B = [C[e] for e in A]

and it gave 它给了

[1, 1, 1, 2, 2, 0]

ie the order appeared random and also the entire code looks complex. 即顺序似乎是随机的,整个代码看起来也很复杂。

Is there any simpler way? 有没有更简单的方法?

You can try something nasty (albeit much more understandable than your current code) like: 你可以尝试一些讨厌的东西(虽然比你当前的代码更容易理解),如:

>>> B = [ord(x) - 65 for x in A]
>>> B
[0, 0, 0, 1, 1, 2]

If A is a big list, consider letting B be a generator, like so: 如果A是一个大的列表,考虑让B成为生成器,如下所示:

B = (ord(x) - 65 for x in A)
a = ['A', 'A', 'A', 'B', 'B', 'C']
x = sorted(set(a))
b = [x.index(y) for y in a]
print(b)
[0, 0, 0, 1, 1, 2]

Do you want the order to be determined by the alphabetical order of the unique element, or the order in which they first appear in the original list? 您是否希望订单由唯一元素的字母顺序或它们首次出现在原始列表中的顺序决定? For instance, should ['C', 'A', 'A', 'A', 'B', 'B', 'C'] turn into [2,0,0,0,1,1,2], or [0,1,1,1,2,2,0]? 例如,应该['C','A','A','A','B','B','C']变成[2,0,0,0,1,1,2] ,还是[0,1,1,1,2,2,0]? If the former: 如果是前者:

uniques = list(set(A))
uniques.sort()
uniques_dict = {uniques[i]:i for i in range(len(uniques))}
B = [uniques_dict[a] for a in A]

for the latter: 对于后者:

uniques_dict = {}
ordinal = 0
for a in A:
  if not (a in uniques_dict.keys):
     uniques_dict[a] = ordinal
     ordinal = ordinal+1
B = [uniques_dict[a] for a in A]

Seems constructing a dictionary/mapping is the key, using it will just be variations on a theme. 似乎构建字典/映射是关键,使用它只是主题的变体。 Even constructing the dictionary will be variations on a theme - whether it is better/worse/simple/complicated is in the eyes of the reader. 甚至构建字典也将是主题的变体 - 在读者眼中是否更好/更差/更简单/更复杂。

>>> import itertools
>>> ordinatates = itertools.count(0)
>>> a = ['a', 'b', 'c', 'a', 'a', 'c', 'c']
>>> unique = sorted(set(a))
>>> d = {thing:ordinal for thing, ordinal in zip(unique, ordinates)}

Apply it 应用它

>>> list(map(d.get, a))
[0, 1, 2, 0, 0, 2, 2]
>>>

It will throw a KeyException if there are items in a that are not in d . 它会抛出的KeyException如果在项目a不在d

similar, same caveat: 类似的,同样的警告:

>>> import operator
>>> a = ['a','b','c', 'a', 'a', 'c','c']
>>> m = map(operator.itemgetter, a)
>>> [get(d) for get in m]
[0, 1, 2, 0, 0, 2, 2]
>>>

Similar without the caveat 类似,没有警告

class Foo(dict):
    def __call__(self, item):
        '''Returns self[item] or None.'''
        try:
            return self[item]
        except KeyError as e:
            # print or log something descriptive - print(repr(e))
            return None

>>> ordinates = itertools.count(0)
>>> a = ['a','b','c', 'a', 'a', 'c','c']
>>> unique = sorted(set(a))
>>> d = Foo((thing,ordinal) for thing, ordinal in zip(unique, ordinates))
>>> result = list(map(d, a))
>>> result
[0, 1, 2, 0, 0, 2, 2]
>>>

All that assumed you wanted the ordinal positions of the sorted items - as your example list was conveniently pre -sorted. 所有这些都假设您想要排序项目的序数位置 - 因为您的示例列表是方便预先排序的。 If you were looking for the position in the list where a unique thing first occurred, construct the mapping like this: 如果您在列表中查找首次出现唯一事物的位置,请构建如下映射:

import itertools
ordinal = itertools.count()
b = ['c','b','c', 'a', 'a', 'c','c']
d = {}
for thing in b:
    if thing in d:
        continue
    d[thing] = next(ordinal)

Application 应用

>>> list(map(d.get, b))
[0, 1, 0, 2, 2, 0, 0]
>>>

@Abdou alluded to this in his comment but you conveniently didn't answer. @Abdou在他的评论中暗示了这一点,但你很方便没有回答。

If you have a one-liner fetish that can be written as 如果你有一个可以写成的单行迷信

d = {}
d.update((thing,d[thing] if thing in d else next(ordinal)) for thing in b)

I will assume that: 1. you don't rely on elements being letters; 我将假设:1。你不依赖于作为字母的元素; 2. you want to index them on the base on the first appearence in the list A . 2.你想在列表A的第一个外观的基础上索引它们。

>>> A = ['A', 'A', 'A', 'B', 'B', 'C']
>>> seen=set()
>>> C={x:len(seen)-1 for x in A if not (x in seen or seen.add(x))}
>>> C
{'B': 1, 'C': 2, 'A': 0}
>>> list(map(C.get, A))
[0, 0, 0, 1, 1, 2]

The second line defines a set, seen , which will store the elements of A we have already seen in the list comprehension of the next line. 第二行定义了一个seen的集合,它将存储我们已经在下一行的列表理解中看到的A的元素。

The third line defines the dictioanry that will map unique elements to their indices. 第三行定义了将唯一元素映射到其索引的dictioanry。 It's a little tricky (although not so unusual). 这有点棘手(尽管不是那么不寻常)。

We iterate through the values of A . 我们遍历A的值。

  • Case 1: the value x is in seen, thus x in seen or ... is True , the second part is not evaluated, and not(...) returns False : x is ignored. 情况1:值x是看到的,因此看到的x in seen or ...True ,第二部分未被评估,而not(...)返回Falsex被忽略。

  • Case 2: the value x is not in seen, thus x in seen is False and the second part is evaluated. 情况2: 看到值x ,因此x in seenFalse ,第二部分被评估。 Remind that seen.add will always return None , which is equivalent to False in this context. 提醒一下, seen.add将始终返回None ,在此上下文中等于False x in seen or seen.add(x) is False , but x has been added to seen . x in seen or seen.add(x)中的x in seen or seen.add(x)False ,但已添加x以供seen And not(...) returns True : x is mapped to the len of seen , which is incremented by one for each new element. not(...)返回Truex被映射到seen的len,对于每个新元素,x加1。

The sixth line simply maps the newly defined dictionary to the values of A . 第六行简单地将新定义的字典映射到A的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM