简体   繁体   中英

I want to fit my histogram with a curve but don't know what to do

I want a normal curve to fit the histogram I already have. navf2 is a list of normalized random numbers and the histogram is based on those, and I want a curve to show the general trend of the histogram.

while len(navf2)<252:
    number=np.random.normal(0,1,None)
    navf2.append(number)
bin_edges=np.arange(70,130,1)
plt.style.use(["dark_background",'ggplot'])
plt.hist(navf2, bins=bin_edges, alpha=1)
plt.ylabel("Frequency of final NAV")
plt.xlabel("Ranges")
ymin=0
ymax=100
plt.ylim([ymin,ymax])
plt.show()

Here You go:

=^..^=

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt

# create raw data
data = np.random.uniform(size=252)

# distribution fitting
mu, sigma = norm.fit(data)

# fitting distribution
x = np.linspace(-0.5,1.5,100)
y = norm.pdf(x, loc=mu, scale=sigma)

# plot data
plt.plot(x, y,'r-')
plt.hist(data, density=1, alpha=1)
plt.show()

Output:

在此处输入图片说明

Here is a another solution using your code as mentioned in the question. We can achieve the expected result without the use of the scipy library. we will have to do three things, compute the mean of the data set, compute the standard deviation of the set, and create a function that generates the normal or Gaussian curve.

To compute the mean we can use the function within numpy library, ie mu = np.mean(your_data_set_here)

The standard deviation of the set is the square root of the sum of the differences of the values and mean squared https://en.wikipedia.org/wiki/Standard_deviation . We can express it in code as follows, using the numpy library again:

data_set = [] # some data set 
sigma = np.sqrt(1/(len(data_set))*sum((data_set-mu)**2))

Finally we have to build the function for the normal curve or Gaussian https://en.wikipedia.org/wiki/Gaussian_function , it relies on both the mean ( mu ) and the standard deviation ( sigma ), so we will use those as parameters in our function:

def Gaussian(x,sigma,mu): # sigma is the standard deviation and mu is the mean
    return ((1/(np.sqrt(2*np.pi)*sigma))*np.exp(-(x-mu)**2/(2*sigma**2)))

putting it all together looks like this:

import numpy as np
import matplotlib.pyplot as plt

navf2 = []
while len(navf2)<252:
    number=np.random.normal(0,1,None) # since all values will be between 0,1 the bin size doesnt work 
    navf2.append(number)

navf2 = np.asarray(navf2) # convert to array for better results 
mu = np.mean(navf2) #the avg of all values in navf2  
sigma = np.sqrt(1/(len(navf2))*sum((navf2-mu)**2)) # standard deviation of navf2
x_vals = np.arange(min(navf2),max(navf2),0.001)    # create a flat range based off data
                                                   # to build the curve 
gauss = [] #store values for normal curve here 

def Gaussian(x,sigma,mu): # defining the normal curve 
    return ((1/(np.sqrt(2*np.pi)*sigma))*np.exp(-(x-mu)**2/(2*sigma**2)))

for val in x_vals : 
    gauss.append(Gaussian(val,sigma,mu))

plt.style.use(["dark_background",'ggplot'])
plt.hist(navf2, density = 1, alpha=1) # add density = 1 to fix the scaling issues  
plt.ylabel("Frequency of final NAV")
plt.xlabel("Ranges")
plt.plot(x_vals,gauss)
plt.show()

Here is a picture of an output:

具有正态曲线的随机数据集

Hope this helps, I tired to keep it as close to your original code as possible !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM