简体   繁体   English

获取每个直方图中的频率

[英]Obtain frequencies in each bin for histogram2d

I have 5 points (x,y) and used matplotlib's histogram2d function to create a heatmap showing different colors denoting the density of each bin. 我有5个点(x,y),并使用matplotlib的histogram2d函数创建了一个热图,该热图显示了表示每个容器密度的不同颜色。 How could I obtain the frequency of the number of points in the bins? 如何获得垃圾箱中点数的频率?

    import numpy as np
    import numpy.random
    import pylab as pl
    import matplotlib.pyplot as plt

    x = [.3, -.3, -.3, .3, .3]
    y = [.3, .3, -.3, -.3, -.4]

    heatmap, xedges, yedges = np.histogram2d(x, y, bins=4)
    extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]

    plt.clf()
    plt.imshow(heatmap, extent=extent)
    plt.show()

    pl.scatter(x,y)
    pl.show()

Thus, using 4 bins, I would expect the frequencies in each bin to be .2, .2, .2, and .4 因此,使用4个bin,我希望每个bin中的频率分别为.2,.2,.2和.4

you're using 4x4 = 16 bins. 您使用的是4x4 = 16个bin。 If you want four total bins, use 2x2: 如果要总共四个箱,请使用2x2:

In [45]: np.histogram2d(x, y, bins=2)
Out[45]: 
(array([[ 1.,  1.],
       [ 2.,  1.]]),
 array([-0.3,  0. ,  0.3]),
 array([-0.4 , -0.05,  0.3 ]))

You can specify the full shape of the output with a tuple: bins=(2,2) 您可以使用元组指定输出的完整形状: bins=(2,2)

If you want to normalize the output, use normed=True : 如果要标准化输出,请使用normed=True

In [50]: np.histogram2d(x, y, bins=2, normed=True)
Out[50]: 
(array([[ 1.9047619 ,  1.9047619 ],
       [ 3.80952381,  1.9047619 ]]),
 array([-0.3,  0. ,  0.3]),
 array([-0.4 , -0.05,  0.3 ]))
heatmap, xedges, yedges = np.histogram2d(x, y, bins=4)
heatmap /= heatmap.sum()

In [57]: heatmap, xedges, yedges = np.histogram2d(x, y, bins=4)

In [58]: heatmap
Out[58]: 
array([[ 1.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 2.,  0.,  0.,  1.]])

In [59]: heatmap /= heatmap.sum()

In [60]: heatmap
Out[60]: 
array([[ 0.2,  0. ,  0. ,  0.2],
       [ 0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ],
       [ 0.4,  0. ,  0. ,  0.2]])

Note that if you use normed=True , then heatmap.sum() in general will not equal 1, rather, the heatmap multiplied by the area of the bin sums to 1. That makes heatmap a distribution, but they are not exactly the frequencies you requested. 请注意,如果您使用heatmap.sum() normed=True ,那么heatmap.sum()通常将等于1,而是将heatmap乘以bin总和的面积等于1。这使heatmap成为分布,但它们与频率不完全相同您要求的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM