简体   繁体   English

Numpy:基于dict的值创建2D数组,使用其他2D数组值作为索引

[英]Numpy: Create 2D array based on value from dict, using other 2D array value as index

I have 2 2D numpy arrays: 我有2个2D numpy数组:

a = np.array([['a', 'b', 'c'], ['d', 'e', 'f']])
b = np.array([[0.01, 0.02, 0.03], [0.04, 0.05, 0.06]])

I aso have a dict with some values: 我有一个带有一些值的字典:

d = {'a': 100, 'b': 200, ... 'f':600}

Now I want to create a 2D array, based on the first 2, and my dict. 现在我想创建一个基于前2个和我的dict的2D数组。 Something like this: 像这样的东西:

c = b * d[a]

In other words, I want to manipulate array b , using a certain value (retrieved from the dict d ) corresponding to a value in array a at the same index. 换句话说,我想使用对应于同一索引处的数组a中的值的特定值(从dict d检索)来操纵数组b

c = np.array([[1, 4, 9], [16, 25, 36]])

Is there any way to do this besides a nested loop? 除了嵌套循环之外,还有什么方法可以做到这一点吗?

You can use the vectorize function on the d.get method to use the values of a as keys for d : 您可以使用矢量化的功能d.get方法使用的值a为键d

>>> np.vectorize(d.get)(a)
array([[100, 200, 300],
       [400, 500, 600]])

Note that this is implemented as a loop behind the scenes so it won't give you much (if any) performance benefit. 请注意,这是在幕后实现的循环,因此它不会给您带来太多(如果有的话)性能优势。

You can combine this into one line: 您可以将其合并为一行:

>>> b*np.vectorize(d.get)(a)
array([[  1.,   4.,   9.],
       [ 16.,  25.,  36.]])

Here's a vectorized approach making use of NumPy functions along the way - 这是一种利用NumPy功能的矢量化方法 -

# Convert strings in a to numeric labels
aID = (np.fromstring(a, dtype=np.uint8)-97).reshape(a.shape)

# Get the argsort for getting sorted keys from dictionary
dk = d.keys()
sidx = np.searchsorted(sorted(dk),dk)

# Extract values from d and sorted by the argsort indices.
# Then, index with the numeric labels from a and multiply with b.
d_vals= np.take(d.values(),sidx)
out = b*d_vals[aID]

Please note that the keys are assumed to be single character strings. 请注意,假设键是单字符串。 If they are not in that format, you can use np.unique to get numeric labels corresponding to elements in a , like so - 如果他们不是在格式,你可以使用np.unique获得对应于要素数字标签a ,像这样-

aID = np.unique(a,return_inverse=True)[1].reshape(a.shape)

Runtime test 运行时测试

In this section, let's use those 6 keys and bigger arrays and time all the approaches posted thus far including the original one suggested in the question - 在本节中,让我们使用这6 keys和更大的数组,并记录迄今为止发布的所有方法,包括问题中建议的原始方法 -

In [238]: def original_app(a,b,d): # From question
     ...:     c = np.zeros(a.shape)
     ...:     for i in range(a.shape[0]):
     ...:         for j in range(a.shape[1]):
     ...:             c[i,j] = b[i,j] * d[a[i,j]]
     ...:     return c
     ...: 
     ...: def vectorized_app(a,b,d): # Proposed code earlier
     ...:     aID = (np.fromstring(a, dtype=np.uint8)-97).reshape(a.shape)
     ...:     dk = d.keys()
     ...:     sidx = np.searchsorted(sorted(dk),dk)
     ...:     d_vals= np.take(d.values(),sidx)
     ...:     return b*d_vals[aID]
     ...: 

In [239]: # Setup inputs
     ...: M, N = 400,500 # Dataisze
     ...: d = {'a': 600, 'b': 100, 'c': 700, 'd': 550, 'e': 200, 'f':80}
     ...: strings = np.array(d.keys())
     ...: a = strings[np.random.randint(0,6,(M,N))]
     ...: b = np.random.rand(*a.shape)
     ...: 

In [240]: %timeit original_app(a,b,d)
1 loops, best of 3: 219 ms per loop

In [241]: %timeit b*np.vectorize(d.get)(a) # @TheBlackCat's solution
10 loops, best of 3: 34.9 ms per loop

In [242]: %timeit vectorized_app(a,b,d)
100 loops, best of 3: 3.17 ms per loop

In [243]: np.allclose(original_app(a,b,d),vectorized_app(a,b,d))
Out[243]: True

In [244]: np.allclose(original_app(a,b,d),b*np.vectorize(d.get)(a))
Out[244]: True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM