简体   繁体   English

用键替换python列表元素

[英]Replacing python list elements with key

I have a list of non-unique strings: 我有一个非唯一字符串列表:

list = ["a", "b", "c", "a", "a", "d", "b"]

I would like to replace each element with an integer key which uniquely identifies each string: 我想用一个唯一标识每个字符串的整数键替换每个元素:

list = [0, 1, 2, 0, 0, 3, 1]

The number does not matter, as long as it is a unique identifier. 该数字无关紧要,只要它是唯一标识符即可。

So far all I can think to do is copy the list to a set, and use the index of the set to reference the list. 到目前为止,我所能想到的是将列表复制到一个集合,并使用集合的索引来引用列表。 I'm sure there's a better way though. 我确信有更好的方法。

This will guarantee uniqueness and that the id's are contiguous starting from 0 : 这将保证唯一性,并且id从0开始是连续的:

id_s = {c: i for i, c in enumerate(set(list))}
li = [id_s[c] for c in list]

On a different note, you should not use 'list' as variable name because it will shadow the built-in type list . 另外,您不应该使用'list'作为变量名,因为它会影响内置类型list

Here's a single pass solution with defaultdict : 这是一个使用defaultdict的单通道解决方案:

from collections import defaultdict
seen = defaultdict()
seen.default_factory = lambda: len(seen)  # you could instead bind to seen.__len__

In [11]: [seen[c] for c in list]
Out[11]: [0, 1, 2, 0, 0, 3, 1]

It's kind of a trick but worth mentioning! 这是一种技巧但值得一提!


An alternative, suggested by @user2357112 in a related question/answer , is to increment with itertools.count . @ user2357112在相关问题/答案中建议的另一种方法是使用itertools.count递增。 This allows you to do this just in the constructor: 这允许您只在构造函数中执行此操作:

from itertools import count
seen = defaultdict(count().__next__)  # .next in python 2

This may be preferable as the default_factory method won't look up seen in global scope. 这可能是最好的方法default_factory不会抬头seen在全球范围内。

>>> lst = ["a", "b", "c", "a", "a", "d", "b"]
>>> nums = [ord(x) for x in lst]
>>> print(nums)
[97, 98, 99, 97, 97, 100, 98]

If you are not picky, then use the hash function: it returns an integer. 如果你不挑剔,那么使用哈希函数:它返回一个整数。 For strings that are the same, it returns the same hash: 对于相同的字符串,它返回相同的哈希:

li = ["a", "b", "c", "a", "a", "d", "b"]
li = map(hash, li)                # Turn list of strings into list of ints
li = [hash(item) for item in li]  # Same as above

A functional approach: 功能方法:

l = ["a", "b", "c", "a", "a", "d", "b", "abc", "def", "abc"]
from itertools import count
from operator import itemgetter

mapped = itemgetter(*l)(dict(zip(l, count())))

You could also use a simple generator function: 您还可以使用简单的生成器函数:

from itertools import count

def uniq_ident(l):
    cn,d  = count(), {}
    for ele in l:
        if ele not in d:
            c = next(cn)
            d[ele] = c
            yield c
        else:
            yield d[ele]


In [35]: l = ["a", "b", "c", "a", "a", "d", "b"]

In [36]: list(uniq_ident(l))
Out[36]: [0, 1, 2, 0, 0, 3, 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM