[英]How to get the mean of two histograms
Just wondering if there is an easy way to get the "mean" of histograms.只是想知道是否有一种简单的方法可以获得直方图的“平均值”。 For example, I have two lists:例如,我有两个列表:
a=[1,2,3,5,6,7]
b=[1,2,3,10]
If I plot a and b using plt.hist() I will have histograms with x-axis to be 1 to 10 and y-axis to be the count of numbers.如果我使用 plt.hist() 绘制 a 和 b,我将得到 x 轴为 1 到 10 且 y 轴为数字计数的直方图。
Now I want to get the mean of a and b like this现在我想像这样得到 a 和 b 的平均值
array([ 1. , 1. , 1. , 0. , 0.5, 0.5, 0.5, 0. , 0. , 0.5])
It's like adding two histograms together and get the mean of y-axis, with x-axis still being number 1 to 10.这就像将两个直方图加在一起并获得 y 轴的平均值,x 轴仍然是数字 1 到 10。
I know I can loop through the list to get this mean array我知道我可以遍历列表来获得这个平均数组
d=np.zeros(10)
for i in range(len(a)):
d[a[i]-1]+=1
for i in range(len(b)):
d[b[i]-1]+=1
d=d/2
But wondering if there is an easier way like (a+b)/2 that doesn't need to use the loop但是想知道是否有更简单的方法,例如 (a+b)/2 不需要使用循环
How about using pandas
groupby
function?使用pandas
groupby
函数怎么样?
a=[1,2,3,5,6,7]
b=[1,2,3,10]
a_b = a+b
#if you don't need 0 data, comment the below code.
c = list(range(min(a_b), max(a_b)))
import pandas as pd
d = {'A':(a_b+c), 'B':[1]*len(a_b)+[0]*len(c)}
#if you don't need 0 data, use the below commented code instead of the above code.
#d = {'A':(a_b), 'B':[1]*len(a_b)}
df = pd.DataFrame(data=d)
df_g = df.groupby('A').sum()
print( list( (df_g/df_g.max())['B'] ) )
Outcome:结果:
[1.0, 1.0, 1.0, 0.0, 0.5, 0.5, 0.5, 0.0, 0.0, 0.5]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.