简体   繁体   English

如何获取直方图 bin 中的数据

[英]How to get data in a histogram bin

I want to get a list of the data contained in a histogram bin.我想获取包含在直方图 bin 中的数据列表。 I am using numpy, and Matplotlib.我正在使用 numpy 和 Matplotlib。 I know how to traverse the data and check the bin edges.我知道如何遍历数据并检查 bin 边缘。 However, I want to do this for a 2D histogram and the code to do this is rather ugly.但是,我想为 2D 直方图执行此操作,并且执行此操作的代码相当难看。 Does numpy have any constructs to make this easier? numpy 是否有任何结构可以使这更容易?

For the 1D case, I can use searchsorted().对于一维情况,我可以使用 searchsorted()。 But the logic is not that much better, and I don't really want to do a binary search on each data point when I don't have to.但是逻辑并没有那么好,我真的不想在不需要的时候对每个数据点进行二分搜索。

Most of the nasty logic is due to the bin boundary regions.大多数讨厌的逻辑是由于 bin 边界区域造成的。 All regions have boundaries like this: [left edge, right edge).所有区域都有这样的边界:[左边缘,右边缘)。 Except the last bin, which has a region like this: [left edge, right edge].除了最后一个 bin,它有一个像这样的区域:[left edge, right edge]。

Here is some sample code for the 1D case:以下是一维案例的一些示例代码:

import numpy as np

data = [0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3]

hist, edges = np.histogram(data, bins=3)

print 'data =', data
print 'histogram =', hist
print 'edges =', edges

getbin = 2  #0, 1, or 2

print '---'
print 'alg 1:'

#for i in range(len(data)):
for d in data:
    if d >= edges[getbin]:
        if (getbin == len(edges)-2) or d < edges[getbin+1]:
            print 'found:', d
        #end if
    #end if
#end for

print '---'
print 'alg 2:'

for d in data:
    val = np.searchsorted(edges, d, side='right')-1
    if val == getbin or val == len(edges)-1:
        print 'found:', d
    #end if
#end for

Here is some sample code for the 2D case:以下是 2D 案例的一些示例代码:

import numpy as np

xdata = [0, 1.5, 1.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3]
ydata = [0, 5,5, 5, 5, 5, \
         15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, \
         25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 30]

xbins = 3
ybins = 3
hist2d, xedges, yedges = np.histogram2d(xdata, ydata, bins=(xbins, ybins))

print 'data2d =', zip(xdata, ydata)
print 'hist2d ='
print hist2d
print 'xedges =', xedges
print 'yedges =', yedges

getbin2d = 5  #0 through 8

print 'find data in bin #', getbin2d

xedge_i = getbin2d % xbins
yedge_i = int(getbin2d / xbins) #IMPORTANT: this is xbins

for x, y in zip(xdata, ydata):
    # x and y left edges
    if x >= xedges[xedge_i] and y >= yedges[yedge_i]:
        #x right edge
        if xedge_i == xbins-1 or x < xedges[xedge_i + 1]:
            #y right edge
            if yedge_i == ybins-1 or y < yedges[yedge_i + 1]:
                print 'found:', x, y
            #end if
        #end if
    #end if
#end for

Is there a cleaner / more efficient way to do this?有没有更清洁/更有效的方法来做到这一点? It seems like numpy would have something for this.似乎 numpy 会为此有所作为。

digitize , from core NumPy, will give you the index of the bin to which each value in your histogram belongs:来自核心 NumPy 的digitize将为您提供直方图中每个值所属的 bin索引

import numpy as NP
A = NP.random.randint(0, 10, 100)

bins = NP.array([0., 20., 40., 60., 80., 100.])

# d is an index array holding the bin id for each point in A
d = NP.digitize(A, bins)     

how about something like:怎么样:

data = numpy.array([0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3])

hist, edges = numpy.histogram(data, bins=3)

for l, r in zip(edges[:-1], edges[1:]):
   print(data[(data > l) & (data < r)]) 

Out:出去:

[ 0.5]
[ 1.5  1.5  1.5]
[ 2.5  2.5  2.5]

with a bit of code to handle the edge cases.用一些代码来处理边缘情况。

pyplot.hist in matplotlib creates a histogram (but also draws it to the screen, which you might not want). matplotlib中的pyplot.hist会创建一个直方图(但也会将其绘制到屏幕上,您可能不需要)。 For just the bins, you can use numpy.histogram, as outlined in another answer. 对于只是垃圾箱,您可以使用numpy.histogram,如另一个答案中所述。

Here is an example comparing pyploy.hist and numpy.histogram. 是一个比较pyploy.hist和numpy.histogram的例子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM