简体   繁体   English

Python:将字符串“键”列表枚举为整数

[英]Python: Enumerate a list of string 'keys' into ints

I searched for a while but didn't find anything that explained exactly what I'm trying to do. 我搜索了一段时间,但没有找到任何能确切解释我正在尝试做的事情。

Basically I have a list of string "labels", eg ["brown", "black", "blue", "brown", "brown", "black"] etc. What I want to do is convert this into a list of integers where each label corresponds to an integer, so 基本上,我有一个字符串“标签”的列表,例如[“棕色”,“黑色”,“蓝色”,“棕色”,“棕色”,“黑色”]等。我要做的就是将其转换为列表整数,其中每个标签对应一个整数,所以

["brown", "black", "blue", "brown", "brown", "black"]

becomes 变成

[1, 2, 3, 1, 1, 2]

I looked into the enumerate function but when I gave it my list of strings (which is quite long), it assigned an int to each individual label, instead of giving the same label the same int: 我查看了枚举函数,但是当我给它我的字符串列表(很长)时,它为每个单独的标签分配了一个int,而不是给同一标签相同的int:

[(1,"brown"),(2,"black"),(3,"blue"),(4,"brown"),(5,"brown"),(6,"black")]

I know how I could do this with a long and cumbersome for loop and if-else checks, but really I'm curious if there's a more elegant way to do this in only one or two lines. 我知道如何通过冗长而繁琐的for循环和if-else检查来做到这一点,但我真的很好奇是否有一种更优雅的方法仅在一行或两行中执行此操作。

You have non-unique labels; 您的标签不唯一; you can use a defaultdict to generate numbers on first access, combined with a counter: 您可以使用defaultdict在首次访问时结合计数器生成数字:

from collections import defaultdict
from itertools import count
from functools import partial

label_to_number = defaultdict(partial(next, count(1)))
[(label_to_number[label], label) for label in labels]

This generates a count in order of the labels first occurrence in labels . 这在标签第一次出现的顺序生成的计数labels

Demo: 演示:

>>> labels = ["brown", "black", "blue", "brown", "brown", "black"]
>>> label_to_number = defaultdict(partial(next, count(1)))
>>> [(label_to_number[label], label) for label in labels]
[(1, 'brown'), (2, 'black'), (3, 'blue'), (1, 'brown'), (1, 'brown'), (2, 'black')]

Because we are using a dictionary, the label-to-number lookups are constant cost, so the whole operation will take linear time based on the length of the labels list. 因为我们使用的是字典,所以标签到数字的查找是不变的,因此整个操作将根据labels列表的长度花费线性时间。

Alternatively, use a set() to get unique values, then map these to a enumerate() count: 或者,使用set()获取唯一值,然后将它们映射到enumerate()计数:

label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
[(label_to_number[label], label) for label in labels]

This assigns numbers more arbitrarily, as set() objects are not ordered: 由于set()对象没有排序,因此可以更随意地分配数字:

>>> label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
>>> [(label_to_number[label], label) for label in labels]
[(2, 'brown'), (3, 'black'), (1, 'blue'), (2, 'brown'), (2, 'brown'), (3, 'black')]

This requires looping through labels twice though. 但是,这需要两次遍历labels

Neither approach requires you to first define a dictionary of labels; 两种方法都不需要您先定义标签字典; the mapping is created automatically. 映射是自动创建的。

You could first create a dictionary like: 您可以先创建一个字典,例如:

dict = {"brown":1 , "black": 2, "blue": 3 }

And then: 然后:

li = ["brown", "black", "blue", "brown", "brown", "black"]
[dict[i] for i in li]

Try this: 尝试这个:

lst = ["brown", "black", "blue", "brown", "brown", "black"]
d = {"brown":1, "black":2, "blue":3}

[d[k] for k in lst]
=> [1, 2, 3, 1, 1, 2]

Of course, for this to work you have to define the equivalences somewhere - above, I used a dictionary for it. 当然,要使其正常工作,您必须在某处定义等效项-上面,我为此使用了字典。 Otherwise, there's no way to know that the color brown corresponds to the number 1, etc. 否则,无法知道棕色对应于数字1等。

The simplest piece of code that reproduces your requested answer is: 再现您所要求的答案的最简单的代码是:

l = ["brown", "black", "blue", "brown", "brown", "black"]
i = [l.index(x)+1 for x in l]
print i

>>> [1, 2, 3, 1, 1, 2]

For a long list, this could get quite slow, but it generates exactly what you asked for, with no preparation of any sort. 对于较长的列表,这可能会变得很慢,但它会完全生成您所要求的内容,而无需进行任何准备。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM