简体   繁体   中英

Enumerate unique strings in list

Disclaimer: I'm not an experienced Python user.

I encountered a task and now I'm trying to figure out the most elegant way to do it in Python.

Here's the task itself: given a list of strings return a list of int s (each int from 0 to N - 1, where N is the number of unique strings in the list), where each int corresponds to a certain string from initial list. Same strings should be mapped to same numbers, different strings - to different numbers.

The first thing I came up with seems "a little bit" overcomplicated:

a = ["a","b","a","c","b","a"]
map(lambda x: dict(map(lambda x: reversed(x), enumerate(set(a))))[x], a)

The result of code above:

[0, 2, 0, 1, 2, 0]

You can use dict and list comprehensions:

>>> a = ["a","b","a","c","b","a"]
>>> d = {x:i for i, x in enumerate(set(a))}
>>> [d[item] for item in a]
[0, 2, 0, 1, 2, 0]

To preserve order:

>>> seen = set()
>>> d = { x:i for i, x in enumerate(y for y in a
                                       if y not in seen and not seen.add(y))}
>>> [d[item] for item in a]
[0, 1, 0, 2, 1, 0]

The above dict comprehension is equivalent to:

>>> seen = set()
>>> lis = []
for item in a:
    if item not in seen:
        seen.add(item)
        lis.append(item)
...         
>>> lis
['a', 'b', 'c']
>>> d = {x:i for i,x in enumerate(lis)}

I think your approach with set could lead to errors if you want to preserve order of approach characters. Actually you can see it in your example - 'b' got index 2 instead of 1 . If you want to keep order, you can use OrderedDict :

>>> a = ["a","b","a","c","b","a"]
>>> d = {x:i for i, x in enumerate(OrderedDict(izip(a, a)).values())}
>>> [d[x] for x in a]
[0, 1, 0, 2, 1, 0]

Emphasis on readability, not speed: I would use the list index method with a list comprehension:

>>> a = ["a","b","a","c","b","a"]
>>> b = list(set(a))
>>> c = [b.index(x) for x in a]
>>> c
[0, 2, 0, 1, 2, 0]

First get the unique strings from the list and enumerate it, so you have a number (from 0 to N-1) for each string. then get this value for each of the strings, and put it in a list. here is how it is done, in one line:

a = ["a","b","a","c","b","a"]
[{s:i for i, s in enumerate(set(a))}[s] for s in a]

You can also do it with a defaultdict and count iterator.

>>> from collections import defaultdict
>>> from itertools import count
>>> a = ["a","b","a","c","b","a"]
>>> x = defaultdict(count().next)
>>> [x[i] for i in a]
[0, 1, 0, 2, 1, 0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM