Disclaimer: I'm not an experienced Python user.
I encountered a task and now I'm trying to figure out the most elegant way to do it in Python.
Here's the task itself: given a list
of strings return a list of int
s (each int
from 0 to N - 1, where N is the number of unique strings in the list), where each int corresponds to a certain string from initial list. Same strings should be mapped to same numbers, different strings - to different numbers.
The first thing I came up with seems "a little bit" overcomplicated:
a = ["a","b","a","c","b","a"]
map(lambda x: dict(map(lambda x: reversed(x), enumerate(set(a))))[x], a)
The result of code above:
[0, 2, 0, 1, 2, 0]
You can use dict and list comprehensions:
>>> a = ["a","b","a","c","b","a"]
>>> d = {x:i for i, x in enumerate(set(a))}
>>> [d[item] for item in a]
[0, 2, 0, 1, 2, 0]
To preserve order:
>>> seen = set()
>>> d = { x:i for i, x in enumerate(y for y in a
if y not in seen and not seen.add(y))}
>>> [d[item] for item in a]
[0, 1, 0, 2, 1, 0]
The above dict comprehension is equivalent to:
>>> seen = set()
>>> lis = []
for item in a:
if item not in seen:
seen.add(item)
lis.append(item)
...
>>> lis
['a', 'b', 'c']
>>> d = {x:i for i,x in enumerate(lis)}
I think your approach with set could lead to errors if you want to preserve order of approach characters. Actually you can see it in your example - 'b'
got index 2
instead of 1
. If you want to keep order, you can use OrderedDict :
>>> a = ["a","b","a","c","b","a"]
>>> d = {x:i for i, x in enumerate(OrderedDict(izip(a, a)).values())}
>>> [d[x] for x in a]
[0, 1, 0, 2, 1, 0]
Emphasis on readability, not speed: I would use the list index
method with a list comprehension:
>>> a = ["a","b","a","c","b","a"]
>>> b = list(set(a))
>>> c = [b.index(x) for x in a]
>>> c
[0, 2, 0, 1, 2, 0]
First get the unique strings from the list and enumerate it, so you have a number (from 0 to N-1) for each string. then get this value for each of the strings, and put it in a list. here is how it is done, in one line:
a = ["a","b","a","c","b","a"]
[{s:i for i, s in enumerate(set(a))}[s] for s in a]
You can also do it with a defaultdict and count iterator.
>>> from collections import defaultdict
>>> from itertools import count
>>> a = ["a","b","a","c","b","a"]
>>> x = defaultdict(count().next)
>>> [x[i] for i in a]
[0, 1, 0, 2, 1, 0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.