简体   繁体   中英

Python/Numpy get average of array based on index

I have two numpy arrays, the first one is the values and the second one is the indexes . What I want to do is to get the average of the values array based on the indexes array.

For example:

values = [1,2,3,4,5]
indexes = [0,0,1,1,2]
get_indexed_avg(values, indexes)
# should give me 
#   [1.5,    3.5,    5]

Here, the values in the indexes array represent the indexes in the final array. Hence:

  1. First two items in the values array are being averaged to form the zero index in the final array.
  2. The 3rd and the 4th item in the values array are being averaged to form the first index in the final array.
  3. Finally the last item is being used to for the 2nd index in the final array.

I do have a python solution to this. But that is just horrible and very slow. Is there a better solution to this? maybe using numpy? or other such libraries.

import pandas as pd
pd.Series(values).groupby(indexes).mean()
# OR
# pd.Series(values).groupby(indexes).mean().to_list()
# 0    1.5
# 1    3.5
# 2    5.0
# dtype: float64

I wanted to avoid pandas so I spent quite some time figuring it out. The way to do this is by using what's called a one-hot encoding .

Creating a one-hot encoding of the indexes will give us a 2-d array with 1s at places where we want them. For example:

indexes = np.array([0,0,1,1,2])
# one_hot = array(
#    [[1., 0., 0.],
#    [1., 0., 0.],
#    [0., 1., 0.],
#    [0., 1., 0.],
#    [0., 0., 1.]]
# )

We just need to get a one-hot for the index array and mat-multiply it with the values to get what we want. Uses answer from this post

values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])

one_hot = np.eye(np.max(indexes) + 1)[indexes]

counts = np.sum(one_hot, axis=0)
average = np.sum((one_hot.T * values), axis=1) / counts

print(average) # [1.5 3.5 5.]

The simplest and easy solution:

values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])
index_set = set(indexes) # index_set = {0, 1, 2}

# Now get values based on the index that we saved in index_set 
# and then take an average
avg = [np.mean(values[indexes==k]) for k in index_set]

print(avg) # [1.5, 3.5, 5.0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM