简体   繁体   中英

2d histogram: Get result of full nbins x nbins

I am using matplotlib's hist2d function to make a 2d histogram of data that I have, however I am having trouble interpreting the result.

Here is the plot I have:

在此处输入图像描述

This was created using the line:

hist = plt.hist2d(X, Y, (160,160), norm=mpl.colors.LogNorm(vmin=1, vmax=20))

This returns a 2d array of (160, 160), as well as the bin edges etc.

In the plot there are bins which have a high frequency of values (yellow bins). I would like to be able to get the results of this histogram and filter out the bins that have low values, preserving the high bins. But I would expect there to be 160*160 values, but I can only find 160 X and 160 Y values.

What I would like to do is essentially filter out the more dense data from the less dense data. If this means representing the data as a single value (a bin), then that is ok.

Am I misinterpreting the function or am I not accessing the data results correctly? I have tried with spicy also but the results seem to be in the same or similar format.

You need Seaborn package.

You mentioned

I would like to be able to get the results of this histogram and filter out the bins that have low values, preserving the high bins .

You should definitely be using one of those:

  1. seaborn.joinplot(...,kind='hex') : it shows the counts of observations that fall within hexagonal bins. This plot works best with relatively large dataset.
  2. seaborn.joinplot(...,kind='kde') : use the kernel density estimation to visualize a bivariate distribution. I recommed it better.

Example 'kde'

Use number of levels n_levels and shade_lowest=False to ignore low values.

import seaborn as sns
import numpy as np
import matplotlib.pylab as plt
x, y = np.random.randn(2, 300)
plt.figure(figsize=(6,5))
sns.kdeplot(x, y, zorder=0, n_levels=6, shade=True, cbar=True, 
     shade_lowest=False, cmap='viridis')

在此处输入图像描述

Not sure if this is what you wanted.

The hist2d docs specify that the function returns a tuple of size 4, where the first item h is a heatmap.

This h will have the same shape as bins .

You can capture the output (it will still plot), and use argwhere to find coordinates where values exceed, say, the 90th percentile:

h, xedges, yedges, img = hist = plt.hist2d(X, Y, bins=(160,160), norm=mpl.colors.LogNorm(vmin=1, vmax=20))

print(list(np.argwhere(h > np.percentile(h, 90))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM