简体   繁体   中英

How to apply dictionary with array as value in numpy array

I'm trying to map this dictionary

dict = {
5: np.array([1,1,1,1,1], dtype='int'),
4: np.array([1,1,1,1,0], dtype='int'),
3: np.array([1,1,1,0,0], dtype='int'),
2: np.array([1,1,0,0,0], dtype='int'),
1: np.array([1,0,0,0,0], dtype='int'),
0: np.array([0,0,0,0,0], dtype='int'),
-1: np.array([-1,0,0,0,0], dtype='int'),
-2: np.array([-1,-1,0,0,0], dtype='int'),
-3: np.array([-1,-1,-1,0,0], dtype='int'),
-4: np.array([-1,-1,-1,-1,0], dtype='int'),
-5: np.array([-1,-1,-1,-1,-1], dtype='int')}

in this numpy array

target
array([[ 2,  0,  2,  0,  0,  3,  0,  0,  1,  0,  0, -2,  4, -2,  0,  0,
        -3, -3, -5,  1,  0,  0,  0,  2],
       [ 4,  4,  3,  2,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,
         1, -1, -2, -1, -2, -2, -3, -4],...])

The elements on the numpy array are int32. How can I map this?

You can simply use a nested list comprehension:

[[mydict[j] for j in i] for i in target]

This yields:

[[array([1, 1, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([1, 1, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([1, 1, 1, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([1, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([-1, -1,  0,  0,  0]), array([1, 1, 1, 1, 0]), array([-1, -1,  0,  0,  0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([-1, -1, -1,  0,  0]), array([-1, -1, -1,  0,  0]), array([-1, -1, -1, -1, -1]), array([1, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([1, 1, 0, 0, 0])], [array([1, 1, 1, 1, 0]), array([1, 1, 1, 1, 0]), array([1, 1, 1, 0, 0]), array([1, 1, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([1, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0]), array([1, 0, 0, 0, 0]), array([-1,  0,  0,  0,  0]), array([-1, -1,  0,  0,  0]), array([-1,  0,  0,  0,  0]), array([-1, -1,  0,  0,  0]), array([-1, -1,  0,  0,  0]), array([-1, -1, -1,  0,  0]), array([-1, -1, -1, -1,  0])]]

As an aside, avoid using dict as a variable name, it overwrites the dict Python built-in.

You can use a list comprehension and feed to np.array :

res = np.array([list(map(d.__getitem__, row)) for row in target])

array([[[ 1,  1,  0,  0,  0],
        [ 0,  0,  0,  0,  0],
        [ 1,  1,  0,  0,  0],
        ...
        [ 0,  0,  0,  0,  0],
        [ 0,  0,  0,  0,  0],
        [ 1,  1,  0,  0,  0]],

       [[ 1,  1,  1,  1,  0],
        [ 1,  1,  1,  1,  0],
        [ 1,  1,  1,  0,  0],
        ...
        [-1, -1,  0,  0,  0],
        [-1, -1, -1,  0,  0],
        [-1, -1, -1, -1,  0]]])

Note the dictionary has been renamed d : don't shadow built-ins.

You can try iterating over the target array and creating a new list with the desired values, which you can convert into an array later if you want.

Something like this maybe:

new_target = []
for e in target:
    new_target.append(the_dict[e])

new_target = np.array(new_target)

EDIT: If you need more dimensiones than 1, then a second loop would be an option.

import numpy as np

my_dict = {
     5: np.array([ 1, 1, 1, 1, 1], dtype='int'),
     4: np.array([ 1, 1, 1, 1, 0], dtype='int'),
     3: np.array([ 1, 1, 1, 0, 0], dtype='int'),
     2: np.array([ 1, 1, 0, 0, 0], dtype='int'),
     1: np.array([ 1, 0, 0, 0, 0], dtype='int'),
     0: np.array([ 0, 0, 0, 0, 0], dtype='int'),
    -1: np.array([-1, 0, 0, 0, 0], dtype='int'),
    -2: np.array([-1,-1, 0, 0, 0], dtype='int'),
    -3: np.array([-1,-1,-1, 0, 0], dtype='int'),
    -4: np.array([-1,-1,-1,-1, 0], dtype='int'),
    -5: np.array([-1,-1,-1,-1,-1], dtype='int'),
}

target = np.array([
    [ 2,  0,  2,  0,  0,  3,  0,  0,  1,  0,
      0, -2,  4, -2,  0,  0, -3, -3, -5,  1,
      0,  0,  0,  2],
    [ 4,  4,  3,  2,  0,  0,  0,  1,  0,  0,
      0,  0,  0,  0,  0,  0,  1, -1, -2, -1,
     -2, -2, -3, -4],
])

new_target = []
for num_list in target:
    sub_new_target = []
    print(num_list)
    for n in num_list:
        sub_new_target.append(my_dict[n])
    new_target.append(sub_new_target)

new_target = np.array(new_target)

print(target.shape)
print(target)
print(new_target.shape)
print(new_target)

Since your keys in your dictionary are contiguous, I would recommend simply using an array here for performance, the pattern to create such an array is very straightforward:

mapper = np.stack([i[1] for i in sorted(d.items())])

array([[-1, -1, -1, -1, -1],
       [-1, -1, -1, -1,  0],
       [-1, -1, -1,  0,  0],
       [-1, -1,  0,  0,  0],
       [-1,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0],
       [ 1,  1,  0,  0,  0],
       [ 1,  1,  1,  0,  0],
       [ 1,  1,  1,  1,  0],
       [ 1,  1,  1,  1,  1]])

Now you simply have to update your indices slightly. The general idea here is that where you currently have a key matching a value in your dictionary, you should now have a value matching a row index in your mapper array. This will be a much more performant option than using a dictionary when working with large arrays:

For your current array, this involved simply incrementing each value by 5, and now you have vectorized indexing:

mapper[target+5]

array([[[ 1.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.],
        [ 1.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.],
        ...
        [ 0.,  0.,  0.,  0.,  0.],
        [ 1.,  1.,  0.,  0.,  0.]],

       [[ 1.,  1.,  1.,  1.,  0.],
        [ 1.,  1.,  1.,  1.,  0.],
        [ 1.,  1.,  1.,  0.,  0.],
        [ 1.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.],
        ...
        [-1., -1.,  0.,  0.,  0.],
        [-1.,  0.,  0.,  0.,  0.]]])

Timings

big_target = np.repeat(target, 10000, axis=0)

In [307]: %%timeit
 ...: mapper = np.stack([i[1] for i in sorted(d.items())])
 ...: mapper[big_target+5]
 ...:
10.5 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [309]: %%timeit
     ...: np.array([list(map(d.__getitem__, row)) for row in big_target])
     ...:
368 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [311]: %timeit np.array([[d[j] for j in i] for i in big_target])
361 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Even with the slight overhead from creating an array from your dictionary, we're looking at a 35x speedup on a (20000, 24) shape array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM