简体   繁体   English

Numpy.histogram加入垃圾箱

[英]Numpy.histogram joining bins

I have some image data that I have plotted in a histogram using numpy as shown in the code below. 我有一些图像数据,我使用numpy在直方图中绘制,如下面的代码所示。 The problem I am having is that the x axis goes up in steps of 1, but the bin width is approximately 1.3 (I roughly calculated by zooming in and looking at the bin width). 我遇到的问题是x轴以1为单位上升,但是bin宽度约为1.3(我通过放大并查看bin宽度来粗略计算)。

This leads to a histogram which looks like this: 这导致直方图如下所示:

在此输入图像描述

As you can see at certain points the histogram goes down to zero. 正如您在某些点上所看到的,直方图降至零。 If I zoom in, the points at which the value is 0 are NOT integers. 如果我放大,则值为0的点不是整数。 Because my data are integers the number 550.8 will obviously appear 0 times which is causing the histogram to have the appearance above (I think). 因为我的数据是整数,所以数字550.8显然会出现0次,这导致直方图具有上面的外观(我认为)。

I can get around this problem if I increase the number of bins from 100 to 1000. This leaves me with the histogram below: 如果我将容器的数量从100增加到1000,我可以解决这个问题。这给我留下了下面的直方图:

在此输入图像描述

So I've finally got to my question (apologies for the long post!)... Is there a way to join the bins (when using a large number like I am to get around my initial problem) using np.histogram. 所以我终于得到了我的问题(为长篇文章道歉!)...有没有办法加入垃圾箱(当使用大量数字时,我会解决我的初始问题)使用np.histogram。 I suppose this is just aesthetics and it isn't essential but it would look better. 我认为这只是美学而且不是必需的,但看起来会更好。

There are other posts on here which I have looked at, but almost all are using plt.hist for their histogram as opposed to np.histogram . 这里有其他帖子我已经看了,但几乎所有plt.hist都使用plt.hist作为直方图,而不是np.histogram

My code: 我的代码:

def histo():

    heights,edges = np.histogram(data, bins=100, range=(minvalue,maxvalue))
    edges = edges[:-1]+(edges[1]-edges[0]) ### not entirely sure what this line is actually doing

    fig, ax = plt.subplots()
    ax.plot(edges,heights)                                                 

    ax.set(title=title, xlabel='ADC Value(DN/40)', ylabel='Frequency')

    #do some analysis of the data between two clicks

    point1, point2 = fig.ginput(2)                                          
    ax.axvspan(point1[0], point2[0], color='blue', alpha=0.5)            
    mask = (edges>point1[0]) & (edges<point2[0])

    ## more analysis code ##       


data = someimage_data

histo()

As you suspect it yourself, the problem is that your integer data need custom-fit bins to get a pretty histogram. 正如您自己怀疑的那样,问题是您的整数数据需要自定义条箱以获得漂亮的直方图。 As a matter of fact, this is usually true for histograms. 事实上,直方图通常都是如此。

Consider the following reconstruction of your problem: 考虑以下重建问题:

import numpy as np

# generate data
data = np.floor(np.random.randn(10000)*20+620)
data = dat[(560<dat) & (dat<650)]

# do what you're doing
heights,edges = np.histogram(data, bins=100, range=(data.min(),data.max()))
edges = edges[:-1]+(edges[1]-edges[0]) # shift first x coordinate to edges[1]
                                     # and drop last point: 1 more edge than bins

fig, ax = plt.subplots()
ax.plot(edges,heights)

The result is convincingly ugly: 结果令人信服地丑陋:

之前

The problem is that you're using 100 bins, but your integer values are between 560 and 650: this means that a few bins will certainly be empty! 问题是你使用的是100个箱子,但你的整数值在560到650之间:这意味着几个箱子肯定是空的!

One easy solution is to set a slightly smaller bin count than the number of your possible unique integer values: 一个简单的解决方案是设置比您可能的唯一整数值​​的数量略小的bin计数:

# do what you're doing
range = [data.min(),data.max()]
heights,edges = np.histogram(data, bins=np.ceil((range[1]-range[0])*0.95), range=range)
edges = edges[:-1]+(edges[1]-edges[0]) # shift first x coordinate to edges[1]

fig, ax = plt.subplots()
ax.plot(edges,heights)

It's getting better: 它变得越来越好:

更好

but clearly there are artifacts from the fact that a few bins contain multiple integers, while others don't. 但显然有一些文物来自于几个箱子包含多个整数的事实,而其他箱子则没有。 This is a less shocking instance of the original problem. 这是原始问题的一个不那么令人震惊的例子。

The ultimate solution is to use tailor-made bins to your problem: use an array_like variable for bins, each containing a single integer. 最终的解决方案是为您的问题使用量身定制的bin:对bin使用array_like变量,每个变量都包含一个整数。 I suggest using an np.arange() , shifted down by 0.5 : 我建议使用np.arange() ,向下移动0.5

# do what you're doing
range = [data.min(),data.max()]
bins = np.arange(range[0],range[1]+2) - 0.5
heights,edges = np.histogram(data, bins=bins, range=range)
edges = edges[:-1]+(edges[1]-edges[0]) # shift first x coordinate to edges[1]

fig, ax = plt.subplots()
ax.plot(edges,heights)

And it's pretty as can be! 而且它很漂亮!

最好

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM