I wanted to calculate the variance for a distribution of discrete values using two different methods, to prove they return identical results:
1. σ**2 = <j**2> - <j>**2
2. σ**2 = <(Δj)**2> = Σ(Δj)**2 *P(j)
Here's my code:
j = [14,15,16,22,24,25]
Nj = [1,1,3,2,2,5]
N = sum(Nj)
Pj = [Nj[i]/N for i in range(len(j))]
j_mean = sum(Pj[i]*j[i] for i in range(len(j)))
j_sqmean = sum(Pj[i]*j[i]**2 for i in range(len(j)))
var1 = j_mean**2 - j_sqmean
var2 = sum((j[i]-j_mean)*Nj[i] for i in range(len(j)))
print(var1,var2)
For some reason var1 != var2
is the result and I can't figure out where I'm going wrong with my code.
You have your two formulas wrong. Change it to:
var1 = j_sqmean -j_mean**2
var2 = sum((j[i]-j_mean)**2 * Pj[i] for i in range(len(j)))
print(var1,var2)
# 18.571428571428555 18.57142857142857
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.