Histogram configuration

Question

We have a set of data. We want the histograms of them and plot them in the logarithm scale. We use the following code:

y,binEdges=np.histogram(hist_data,bins=200)
bincenters = 0.8*(binEdges[1:]+binEdges[:-1])
p.plot(bincenters,y,'-')
p.yscale('log', nonposy='clip')

The result is: 箱子数量= 200

However, when I try to increase the bin(ie from bins=200 to bins=600), the result is: 箱子数量= 600]

How is able to keep only the lines and not the whole spectrum of each histogram?

Answer 1

What you are seeing is that some of the bins are empty , so it draws a rectangle that goes from f(y) -> 0 -> f(y+delta) -> 0 -> f(y+2*delta) . A common trick to get around this is not to use a sharp cutoff as your bin (we call it a kernal ). You can use, for example, Kernel density estimation to "smooth" out the histogram. In this case you place a bunch of gaussians centered at your data points -- the sum is reflective of the underlying probability distribution. You can use scipy to perform the KDE or the nice package seaborn that will do it with the plotting automatically. The picture from the linked seaborn example gives a nice illustration of this:

在此输入图像描述

To use matplotlib's hist without drawing boxes and only using the lines pass in histtype="step" .

Answer 2

如果某些bin是空的，你可以使用布尔索引过滤掉它们：

p.plot(bincenters[y>0],y[y>0],'-')

Histogram configuration

Question

2 answers

solution1
2 ACCPTED 2014-12-14 18:37:26

solution2
1 2014-12-14 18:29:47

Histogram configuration

Question

2 answers

solution1 2 ACCPTED 2014-12-14 18:37:26

solution2 1 2014-12-14 18:29:47

solution1
2 ACCPTED 2014-12-14 18:37:26

solution2
1 2014-12-14 18:29:47