N个变量的Python联合分布

Question

So I need to calculate the joint probability distribution for N variables. 所以我需要计算N个变量的联合概率分布。 I have code for two variables, but I am having trouble generalizing it to higher dimensions. 我有两个变量的代码，但我很难将它推广到更高的维度。 I imagine there is some sort of pythonic vectorization that could be helpful, but, right now my code is very C like (and yes I know that is not the right way to write Python). 我想有一些pythonic矢量化可能会有所帮助，但是，现在我的代码非常像C（是的，我知道这不是编写Python的正确方法）。 My 2D code is below: 我的2D代码如下：

import numpy
import math



feature1 = numpy.array([1.1,2.2,3.0,1.2,5.4,3.4,2.2,6.8,4.5,5.6,1.9,2.8,3.7,4.4,7.3,8.3,8.1,7.0,8.0,6.8,6.2,4.9,5.7,6.3,3.7,2.4,4.5,8.5,9.5,9.9]);
feature2 = numpy.array([11.1,12.8,13.0,11.6,15.2,13.8,11.1,17.8,12.5,15.2,11.6,20.8,14.7,14.4,15.3,18.3,11.4,17.0,16.0,16.8,12.2,14.9,15.7,16.3,13.7,12.4,14.2,18.5,19.8,19.0]);



#===Concatenate All Features===#
numFrames = len(feature1);
allFeatures = numpy.zeros((2,numFrames));
allFeatures[0,:] = feature1;
allFeatures[1,:] = feature2;

#===Create the Array to hold all the Bins===#
numBins = int(0.25*numFrames);
allBins = numpy.zeros((allFeatures.shape[0],numBins+1));

#===Find the maximum and minimum of each feature===#
allRanges = numpy.zeros((allFeatures.shape[0],2));
for f in range(allFeatures.shape[0]):
    allRanges[f,0] = numpy.amin(allFeatures[f,:]);
    allRanges[f,1] = numpy.amax(allFeatures[f,:]);

#===Create the Array to hold all the individual feature probabilities===#
allIndividualProbs = numpy.zeros((allFeatures.shape[0],numBins));

#===Grab all the Individual Probs and the Bins===#
for f in range(allFeatures.shape[0]):
    freqhist, binedges = numpy.histogram(allFeatures[f,:],bins=numBins,range=[allRanges[f,0],allRanges[f,1]],density=False);
    allBins[f,:] = binedges;
    allIndividualProbs[f,:] = freqhist;

#===Create the joint probability array===#
jointProbs = numpy.zeros((numBins,numBins));

#===Compute the joint probability distribution===#
numElements = 0;
for b1 in range(numBins):
    for b2 in range(numBins):
        for f1 in range(numFrames):
            for f2 in range(numFrames):
                if ( ( (feature1[f1] >= allBins[0,b1]) and (feature1[f1] <= allBins[0,b1+1]) ) and ((feature2[f2] >= allBins[1,b2]) and (feature2[f2] <= allBins[1,b2+1])) ):
                    jointProbs[b1,b2] += 1;
                    numElements += 1;

jointProbs /= numElements;

#===But what if I add the following===#
feature3 = numpy.array([21.1,21.8,23.5,27.6,25.2,23.8,22.1,22.8,26.5,25.2,28.6,20.8,24.7,24.4,29.3,28.3,27.4,26.0,26.2,26.1,25.9,24.0,22.7,22.3,23.7,26.4,24.2,28.5,29.8,29.0]);

How can I generalize the large loop? 如何推广大循环？ For N variables (features) this loop would be enormous. 对于N个变量（特征），这个循环将是巨大的。 Is there a Pythonic way to do this easily? 是否有Pythonic方法可以轻松完成此操作？

Answer 1

Check out the function numpy.histogramdd . 看看函数numpy.histogramdd 。 This function can compute histograms in arbitrary numbers of dimensions. 此函数可以计算任意维数的直方图。 If you set the parameter normed=True , it returns the bin count divided by the bin hypervolume. 如果设置参数normed=True ，则返回bin计数除以bin hypervolume。 If you'd prefer something more like a probability mass function (where everything sums to 1), just normalize it yourself. 如果你喜欢更像概率质量函数（其中所有东西总和为1）的东西，那么就自己规范化。 All together, you'll have something like: 总之，你会有类似的东西：

import numpy as np
numBins = 10  # number of bins in each dimension
data = np.random.randn(100000, 3)  # generate 100000 3-d random data points
jointProbs, edges = np.histogramdd(data, bins=numBins)
jointProbs /= jointProbs.sum()

N个变量的Python联合分布

问题描述

1 个解决方案

解决方案1
12 已采纳 2013-12-02 21:28:17

N个变量的Python联合分布

问题描述

1 个解决方案

解决方案1 12 已采纳 2013-12-02 21:28:17

解决方案1
12 已采纳 2013-12-02 21:28:17