简体   繁体   中英

How to insert values into numpy array with groupby summation

I have an empty numpy array a , an array with values that should be inserted v , and array with indeces, where these values should be inserted i . I want to insert values from array v into array a using indeces i . It can be done by simply a[i] = v when values in i are unique.

How to do that if values in i have duplicates and I want to compute sum of duplicates?

In case of duplicate indeces in i , only the last occurence in i will be used:

from numpy import *
a = zeros(5)
i = array([1, 1, 2, 3])
v = array([10, 20, 30, 40])
a[i] = v
print(a) # [ 0. 20. 30. 40.  0.]

A loop over i works, but it is slow:

for j1, j2 in enumerate(i):
    a[j2] += v[j1]
print(a) # [ 0. 30. 30. 40.  0.]

An algorithm with iterative search, use and removal of unique values in i is too complex for this simple task.

How to do this summation without a loop?

A similar problem was here: Add multiple values to one numpy array index

The answer is:

add.at(a, i, v)

The proposed answer by @Anton is pretty good. You can also use np.bincount with weights which is a built in function for this purpose:

a = np.bincount(i,v,minlength=5)
#[ 0. 30. 30. 40.  0.]

Equivalent pandas groupby solution:

df = pd.DataFrame(v).groupby(i).sum()
a[df.index] = df.to_numpy().flatten()
#[ 0. 30. 30. 40.  0.]

You can also use np.diff or np.searchsorted to achieve this goal too. I find the above ones more readable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM