简体   繁体   English

带有柱状图的直方图值的百分比?

[英]Histogram with bins a percentage of values?

I am creating a histogram in python and I want the bin edges to be a percentage of given values (5-10%). 我正在python中创建直方图,并且我希望bin边缘是给定值的百分比(5-10%)。 What would be the best way to go about this so that I don't leave gaps in the bin boundaries, and don't have to pre-set some values for the bin boundary calculation? 做到这一点的最佳方法是什么,这样我就不会在容器边界中留下空隙,并且不必为容器边界计算预先设置一些值了?

In general, it's convenient to create histograms using pre-defined tools like numpy.histogram , though your newly posted comment- suggesting that you're using matplotlib- is also totally fine. 通常,使用诸如numpy.histogram之类的预定义工具创建直方图很方便,尽管您最近发布的注释 (暗示您正在使用matplotlib)也完全可以。 Either way allows you to create a set number of automatically determined bins of equal width... 无论哪种方式,您都可以创建一定数量的自动确定的等宽宽度的箱...

import numpy
data = [0,1,1,1,1,1,1,2,3,3]
hist, edges = numpy.histogram( data , bins = 10)
>>> hist
array([1, 0, 0, 6, 0, 0, 1, 0, 0, 2])
>>> edges
array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8,  2.1,  2.4,  2.7,  3. ])

...Or, in the odd case where you want predefined bins (possibly of different width), you can specify the bin edges yourself (read the docs for information on how this works): ...或者,在奇怪的情况下,您想要预定义的分档(可能具有不同的宽度),您可以自己指定分档的边缘(请阅读文档以获取有关其工作原理的信息):

>>> hist, edges = numpy.histogram( data , bins = [0,.5,1., 1.5,2,3])
>>> hist 
array([1, 0, 6, 0, 3])
>>> edges
array([ 0. ,  0.5,  1. ,  1.5,  2. ,  3. ])
>>> 

Just be careful about using drastically different bin sizes , however. 但是,请注意使用完全不同的箱尺寸 In many cases this sort of coarse graining could distort the relationships between the numbers you're trying to compare. 在许多情况下,这种粗粒度可能会扭曲您要比较的数字之间的关系。

As for your value +/-10% boundary? 至于您的值+/- 10%边界?

preferred_bin_centers = [0,1,2,3]
bin_pairs = [ ( 0.9* v , 1.1*v ) for v in preferred_bin_centers ]
>>> [[0.0, 0.0], [0.9, 1.1], [1.8, 2.2], [2.7, 3.3000000000000003]]

Or, flattened into a list form that could be used by numpy.histogram... 或者,将其展平为numpy.histogram可以使用的列表形式...

bin_edges = sum( [  [ 0.9* v , 1.1*v ]  for v in values ]    , [] )

>>> [0.0, 0.0, 0.9, 1.1, 1.8, 2.2, 2.7, 3.3000000000000003]

(Note from the first two items of the above list that this code gives confusing bin edges if one of your bin centers is 0; I left that in solely as an example of what to watch out for) (请注意,从上面列表的前两个项目可以看出,如果您的bin中心之一为0,则此代码会产生令人困惑的bin边缘;我仅将其留作了注意事项的示例)

Incidentally, the bin edges as defined above will also create intermediate bins outside your desired range. 顺便说一句,上面定义的垃圾箱边缘也会在所需范围之外创建中间垃圾箱。 For example, if you bin items within +/- 10% of 1,2, and 3, then inherently, there will also be a bin between 2.2 and 2.7 (the "outside edges" of your desired bins) where numbers like 2.5 would go. 例如,如果将项目合并在1,2和3的+/- 10%之内,那么,从本质上讲,还将在2.2到2.7之间(您想要的垃圾箱的“外部边缘”)存在一个垃圾箱,其中的数字为2.5走。 If you have values that exist in between your desired bins, then you may want to adjust your cutoffs or visualization accordingly. 如果期望的分档之间存在值,则可能需要相应地调整截止值或可视化。

Maybe I'm oversimplifying your question? 也许我简化了您的问题?

def bins(data, nbins):
    range = max(data) - min(data)
    binsize = range / float(nbins)
    bins = [x * binsize for x in range(nbins)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM