简体   繁体   中英

How to assign values to given indices to an array and average on repeated indices?

Is there a neat way to assign values to given indices in an array, and average values in repeated indices? For example:

a = np.array([0, 0, 0, 0, 0])
ind = np.array([1, 1, 2, 3])
b = np.array([2, 3, 4, 5])

and I want to assign values in array b to array a at corresponding indices indicated in 'ind', and a[1] should be average of 2 and 3.

I can try a for-loop:

hit = np.zeros_like(a)
for i in range(ind.size):
    hit[ind[i]] += 1
    a[ind[i]] += b[i]
a = a / hit   

But this code looks dirty. Is there any better way to do the job?

You could do this using np.where .

import numpy as np
a = np.array([0, 0, 0, 0, 0]).astype('float64')
ind = np.array([1, 1, 2, 3])
b = np.array([2, 3, 4, 5])

for i in set(ind):
    a[i] = np.mean(b[np.where(ind == i)])

Would result in:

In [5]: a
Out[5]: array([0. , 2.5, 4. , 5. , 0. ])

You are essentially finding all indices of ind where the value of ind[index] is equal to i and then obtaining the mean of the values at those indices in b and assigning that mean to a[i] . Hope this helps!

Here is a vectorized method. The actual logic is close to your own solution.

n,d = (np.bincount(ind,x,a.size) for x in (b,None))
valid = d!=0
np.copyto(a,np.divide(n,d,where=valid),where=valid)
In [56]: a = np.zeros(5) 
    ...: hit = np.zeros_like(a) 
    ...: for i in range(ind.size): 
    ...:     hit[ind[i]] += 1 
    ...:     a[ind[i]] += b[i] 

In [57]: a                                                                                                   
Out[57]: array([0., 5., 4., 5., 0.])
In [58]: hit                                                                                                 
Out[58]: array([0., 2., 1., 1., 0.])

The mention of duplicate indices brings to mind the .at ufunc method:

In [59]: a = np.zeros(5)                                                                                     
In [60]: a = np.zeros(5) 
    ...: hit = np.zeros_like(a) 
    ...: np.add.at(a,ind,b) 
    ...: np.add.at(hit,ind,1)                                                                                                      
In [61]: a                                                                                                   
Out[61]: array([0., 5., 4., 5., 0.])
In [62]: hit                                                                                                 
Out[62]: array([0., 2., 1., 1., 0.])

This isn't quite as fast as a[ind]=b , but faster than your loop.

np.bincount might well be better for this task, but this add.at is worth knowing and testing.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.at.html

This might not necessarily be cleaner or faster, but here's an alternative that I think is easy to read:

a = [[] for _ in range(5)]
for i, x in zip(ind, b):
    a[i].append(x)
[np.mean(x) if len(x) else 0 for x in a]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM