简体   繁体   中英

Retrieving bin data from 2d histograms in numpy

I've managed to use numpy.histogram2d() to allot around 200 points into bins. However, what I cannot figure out is how to access what values are stored in each bin.

Any idea how to go about that?

From the numpy doc :

import numpy as np
xedges = [0, 1, 1.5, 3, 5]
yedges = [0, 2, 3, 4, 6]
x = np.random.normal(3, 1, 100)
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(y, x, bins=(xedges, yedges))

H contains the bi-dimensional histogram values. If xedges is of length m and yedges of length n , H will have a (m-1, n-1) shape

You can also specify the number of bins for each dimension:

x = np.random.normal(3, 1, 100)
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(y, x, bins=(5, 6))

The shape of H will then be the same shape you provided in the bins keyword: (5, 6)

I just tried this example in the matplotlib manual

notice the hist, xedges, yedges = np.histogram2d(x, y, bins=4)

the method has three output values, of which hist is a 2D-array with values in the bins; same as you would pass to imshow to plot a projection of this histogram.

I am currently facing the same challenge and I haven't found any solution online or on documentations.

So here's what I have come up with:

# Say you have the following coordinate points:
data = np.array([[-73.589,  45.490],
             [-73.591,  45.497],
             [-73.592,  45.502],
             [-73.574,  45.531],
             [-73.552,  45.534],
             [-73.570,  45.512]])

# These following variables are to determine the range we want for the bins. I use 
# values a bit wider than my max and min values for x and y
extenti = (-73.600, -73.540)
extentj = (45.480, 45.540)

# Run numpy's histogram2d function to return two variables we'll be using 
# later: hist and edges
hist, *edges = np.histogram2d(data[:,0], data[:,1], bins=4, range=(extenti, extentj))

# You can visualize the histogram using matplotlibs's own 2D-histogram:
plt.hist2d(data[:,0], data[:,1], bins=4)

# We'll use numpy's digitize now. According to Numpy's documentarion, numpy.digitize 
# returns the indices of the bins to which each value in input array belongs. However 
# I haven't managed yet to make it work well for the problem we have of 2d histograms. 
# You might manage to, but for now, the following has been working well for me:

# Run np.digitize once along the x axis of our data, and using edges[0].
# edges[0] contains indeed the x axis edges of the numpy.histogram2d we
# made earlier. This will the x-axis indices of bins containing data points. 
hitx = np.digitize(data[:, 0], edges[0])
# Now run it along the y axis, using edges[1]
hity = np.digitize(data[:, 1], edges[1])

# Now we put those togeter.
hitbins = list(zip(hitx, hity))

# And now we can associate our data points with the coordinates of the bin where
# each belongs
data_and_bins = list(zip(data, hitbins))

From there we can choose a bin by its coordinates and find the data points that have that bin associated to it!

You can do stuff like:

[item[0] for item in data_and_bins if item[1] == (1, 2)]

Where (1, 2) is the coordinates of the bin from which you want to retrieve the data. In our case there were two data points there, and they will be listed by the line above.

Just keep in mind np.digitize(), which we used, indicates out-of-bounds with either 0 or len(bins), meaning the first bin will have coordinates (1, 1) rather than (0, 0)

Also keep in mind if you and numpy agree on what is the "first" bin. I believe it starts counting from bottom-left to upper-right. But I could be mistaken there.

Hope this helps you or whoever else encounters this challenge.

I checked a lot this issue, as well. Especially tried to gather information from image, one of the outputs of matplotlib's hist2d, but it was always a fail. Then I wrote this, loop in a loop. I know this is still brute force, not even close to an elegant solution, but it may still make someones life easier at some point. Here it is:

for bin_fl in range(nbins):
    fl_elm = []
    Pprom_elm = []
    for elm in range(len(Array_x_axis)):
        if Width_t[elm]<=xedges[bin_fl+1]: # +1 is needed since the first 
            fl_elm.append(elm)             # element of xedges is zero
    fl_elm=np.array(fl_elm)
    for elem in fl_elm:
        Pprom_elm.append(Pprom_t[elem])
    Pprom_elm=np.array(Pprom_elm)

So, I first get the bin indices correspond to the elements in xbins. Then take those indices to find the corresponding values for the other axis. Enjoy!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM