How to get the mean of two histograms

Question

Just wondering if there is an easy way to get the "mean" of histograms. For example, I have two lists:

a=[1,2,3,5,6,7]
b=[1,2,3,10]

If I plot a and b using plt.hist() I will have histograms with x-axis to be 1 to 10 and y-axis to be the count of numbers.

Now I want to get the mean of a and b like this

array([ 1. ,  1. ,  1. ,  0. ,  0.5,  0.5,  0.5,  0. ,  0. ,  0.5])

It's like adding two histograms together and get the mean of y-axis, with x-axis still being number 1 to 10.

I know I can loop through the list to get this mean array

d=np.zeros(10)
for i in range(len(a)):
    d[a[i]-1]+=1
for i in range(len(b)):
    d[b[i]-1]+=1
d=d/2

But wondering if there is an easier way like (a+b)/2 that doesn't need to use the loop

Answer 1

How about using pandas groupby function?

a=[1,2,3,5,6,7]
b=[1,2,3,10]

a_b = a+b
#if you don't need 0 data, comment the below code.
c = list(range(min(a_b), max(a_b)))

import pandas as pd

d = {'A':(a_b+c), 'B':[1]*len(a_b)+[0]*len(c)}
#if you don't need 0 data, use the below commented code instead of the above code.
#d = {'A':(a_b), 'B':[1]*len(a_b)} 
df = pd.DataFrame(data=d)
df_g = df.groupby('A').sum()

print( list( (df_g/df_g.max())['B'] ) )

Outcome:

[1.0, 1.0, 1.0, 0.0, 0.5, 0.5, 0.5, 0.0, 0.0, 0.5]

How to get the mean of two histograms

Question

1 answers

solution1
0 2019-02-27 06:36:17

How to get the mean of two histograms

Question

1 answers

solution1 0 2019-02-27 06:36:17

solution1
0 2019-02-27 06:36:17