简体   繁体   中英

Vectorization - Adding numpy arrays without loops?

So I have the following numpy arrays:

c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])
X = array([[10, 15, 20,  5],
           [ 1,  2,  6, 23]])
y = array([1, 1])

I am trying to add each 1x4 row in the X array to one of the columns in c . The y array specifies which column. The above example, means that we are adding both rows in the X array to column 1 of c. That is, we should expect the result of:

     c = array([[ 1,  2+10+1,  3],  =  array([[ 1,  13,  3],
                [ 4,  5+15+2,  6],            [ 4,  22,  6],
                [ 7,  8+20+6,  9],            [ 7,  34,  9],
                [10, 11+5+23, 12]])           [10,  39, 12]])  

Does anyone know how I can do this without any loops? I tried c[:,y] += X but it seems like this only adds the second row of X to column 1 of c once. With that being said, it should be noted that y does not necessarily have to be [1,1] , it can also be [0,1] . In this case, we would add the first row of X to column 0 of c and the second row of X to column 1 of c .

My first thought when I saw your desired calculation, was to just sum the 2 rows of X , and add that to the 2nd column of c :

In [636]: c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])

In [637]: c[:,1]+=X.sum(axis=0)

In [638]: c
Out[638]: 
array([[ 1, 13,  3],
       [ 4, 22,  6],
       [ 7, 34,  9],
       [10, 39, 12]])

But if we want to work from a general index like y , we need a special bufferless operation - that is if there are duplicates in y :

In [639]: c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])

In [641]: np.add.at(c,(slice(None),y),X.T)

In [642]: c
Out[642]: 
array([[ 1, 13,  3],
       [ 4, 22,  6],
       [ 7, 34,  9],
       [10, 39, 12]])

You need to look up .at in the numpy docs.

in Ipython add.at? shows me the doc that includes:

Performs unbuffered in place operation on operand 'a' for elements specified by 'indices'. For addition ufunc, this method is equivalent to a[indices] += b , except that results are accumulated for elements that are indexed more than once. For example, a[[0,0]] += 1 will only increment the first element once because of buffering, whereas add.at(a, [0,0], 1) will increment the first element twice.

With a different y it still works

In [645]: np.add.at(c,(slice(None),[0,2]),X.T)

In [646]: c
Out[646]: 
array([[11,  2,  4],
       [19,  5,  8],
       [27,  8, 15],
       [15, 11, 35]])

Firstly, your code seems to work in general if you transpose X . For example:

c = array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])
X = array([[10, 15, 20,  5],
           [ 1,  2,  6, 23]]).transpose()
y = array([1, 2])

c[:,y] += X
print c
#OUTPUT:
#[[ 1 12  4]
# [ 4 20  8]
# [ 7 28 15]
# [10 16 35]]

However, it doesn't work when there are any duplicate columns in y , like in your specific example. I believe this is because c[:, [1,1]] will generate an array with two columns, each having the slice c[:, 1] . Both of these slices point to the same part of c, and so when the addition happens on each, they are both read, then the corresponding part of X is added to each, then they are written back, meaning the last one to be written back is the final value. I don't believe numpy will let you vectorize an operation like this because it fundamentally can't be. This requires editing one column at a time, saving back it's value, and then editing it again later.

You might have to settle for no duplicates, or otherwise implement something like an accumulator.

This is the solution I came up with:

def my_func(c, X, y):
    cc = np.zeros((len(y), c.shape[0], c.shape[1]))
    cc[range(len(y)), :, y] = X
    return c + np.sum(cc, 0)

The following interactive session demonstrates how it works:

>>> my_func(c, X, y)
array([[  1.,  13.,   3.],
       [  4.,  22.,   6.],
       [  7.,  34.,   9.],
       [ 10.,  39.,  12.]])
>>> y2 = np.array([0, 2])
>>> my_func(c, X, y2)
array([[ 11.,   2.,   4.],
       [ 19.,   5.,   8.],
       [ 27.,   8.,  15.],
       [ 15.,  11.,  35.]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM