my dictionary:
d = {'a':1, 'b':2, 'c':3}
and my list of keys:
keys = np.array(['a','b','a','c','a','b'])
I would like to have, without using for loops, the list of the corresponding values
I tried with for loops in the following way, but it's too computationally expensive for the purpose I am working at.
This is the for loop version.
l = [d[i] for i in keys]
Do you know a version WITHOUT FOR LOOPS, maybe exploiting broadcasting..masks of np.array?
For the general case, the list comprehension approach [d[i] for i in keys]
is fine.
For very large lists , one approach to gain some improvement in performance would be to define a structured array, which allows to work with mixed types, and use np.searchsorted
:
def str_array(d, keys):
items = list(d.items())
k, v = zip(*items)
dtype_v = np.max(v).itemsize
dtype_k = np.array(k).dtype
a = np.array(items, dtype=[('key', dtype_k),
('value', f'i{dtype_v}')])
ixs_s = np.argsort(a['key'])
k_ixs = ixs_s[np.searchsorted(a['key'], keys, sorter=ixs_s)]
return a['value'][k_ixs]
str_array(d,keys)
# array([1, 2, 1, 3, 1, 2])
Let's compare performances along with some other typical approaches:
d = {'key1':100, 'some_other_key':8, 'key3':15, 'nth_key':0}
perfplot.show(
setup=lambda n: np.random.choice(list(d.keys()), size=n),
kernels=[
lambda x: np.array([d[i] for i in x]),
lambda x: np.vectorize(d.get)(x),
lambda x: pd.Series(d).loc[x].values,
lambda x: operator.itemgetter(*x)(d),
lambda x: str_array(d, x),
],
labels=['list-comp', 'np.vectorize', 'pd.loc', 'itemgetter', 'str_array'],
n_range=[2**k for k in range(0, 20)],
xlabel='N'
)
So for instance for n=100_000
:
keys = np.random.choice(list(d.keys()), size=100_000)
%timeit str_array(d, keys)
# 5.51 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit [d[i] for i in keys]
# 51.7 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
With the second approach using np.searchsorted
we get a 10 times faster approach than with a simple list comprehension.
I dont know about relative performance, but I find this solution to be very fast and simple. Convert your keys to a series and then use the pandas built in map function to return your answer.
import pandas as pd
d = {'a':1, 'b':2, 'c':3}
keys = np.array(['a','b','a','c','a','b'])
keys1 = pd.Series(keys)
keys1.map(d)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.