简体   繁体   中英

How do I calculate PDF (probability density function) in Python?

I have the following code below that prints the PDF graph for a particular mean and standard deviation.

http://imgur.com/a/oVgML

Now I need to find the actual probability, of a particular value. So for example if my mean is 0, and my value is 0, my probability is 1. This is usually done by calculating the area under the curve. Similar to this:

http://homepage.divms.uiowa.edu/~mbognar/applets/normal.html

I am not sure how to approach this problem

import numpy as np
import matplotlib    
import matplotlib.pyplot as plt

def normal(power, mean, std, val):
    a = 1/(np.sqrt(2*np.pi)*std)
    diff = np.abs(np.power(val-mean, power))
    b = np.exp(-(diff)/(2*std*std))
    return a*b

pdf_array = []
array = np.arange(-2,2,0.1)
print array
for i in array:
    print i
    pdf = normal(2, 0, 0.1, i)
    print pdf
    pdf_array.append(pdf)

plt.plot(array, pdf_array)
plt.ylabel('some numbers')
plt.axis([-2, 2, 0, 5])
plt.show()

print 

Unless you have a reason to implement this yourself. All these functions are available in scipy.stats.norm

I think you asking for the cdf , then use this code:

from scipy.stats import norm
print(norm.cdf(x, mean, std))

If you want to write it from scratch:

class PDF():
    def __init__(self,mu=0, sigma=1):
        self.mean = mu
        self.stdev = sigma
        self.data = []

    def calculate_mean(self):
        self.mean = sum(self.data) // len(self.data)
        return self.mean

    def calculate_stdev(self,sample=True):
        if sample:
            n = len(self.data)-1
        else:
            n = len(self.data)
        mean = self.mean
        sigma = 0
        for el in self.data:
            sigma += (el - mean)**2
        sigma = math.sqrt(sigma / n)
        self.stdev = sigma
        return self.stdev

    def pdf(self, x):
        return (1.0 / (self.stdev * math.sqrt(2*math.pi))) * math.exp(-0.5*((x - self.mean) / self.stdev) ** 2)



The area under a curve y = f(x) from x = a to x = b is the same as the integral of f(x)dx from x = a to x = b . Scipy has a quick easy way to do integrals. And just so you understand, the probability of finding a single point in that area cannot be one because the idea is that the total area under the curve is one (unless MAYBE it's a delta function). So you should get 0 ≤ probability of value < 1 for any particular value of interest. There may be different ways of doing it, but a conventional way is to assign confidence intervals along the x-axis like this . I would read up on Gaussian curves and normalization before continuing to code it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM