简体   繁体   English

2d 直方图:获取完整 nbins x nbins 的结果

[英]2d histogram: Get result of full nbins x nbins

I am using matplotlib's hist2d function to make a 2d histogram of data that I have, however I am having trouble interpreting the result.我正在使用 matplotlib 的 hist2d function 来制作我拥有的数据的二维直方图,但是我无法解释结果。

Here is the plot I have:这是我拥有的 plot:

在此处输入图像描述

This was created using the line:这是使用以下行创建的:

hist = plt.hist2d(X, Y, (160,160), norm=mpl.colors.LogNorm(vmin=1, vmax=20))

This returns a 2d array of (160, 160), as well as the bin edges etc.这将返回 (160, 160) 的二维数组,以及 bin 边缘等。

In the plot there are bins which have a high frequency of values (yellow bins).在 plot 中,存在具有高频率值的 bin(黄色 bin)。 I would like to be able to get the results of this histogram and filter out the bins that have low values, preserving the high bins.我希望能够获得此直方图的结果并过滤掉具有低值的 bin,保留高 bin。 But I would expect there to be 160*160 values, but I can only find 160 X and 160 Y values.但我希望会有 160*160 的值,但我只能找到 160 X 和 160 Y 的值。

What I would like to do is essentially filter out the more dense data from the less dense data.我想做的基本上是从密度较小的数据中过滤掉密度较大的数据。 If this means representing the data as a single value (a bin), then that is ok.如果这意味着将数据表示为单个值(bin),那没关系。

Am I misinterpreting the function or am I not accessing the data results correctly?我是误解了 function 还是我没有正确访问数据结果? I have tried with spicy also but the results seem to be in the same or similar format.我也尝试过辣味,但结果似乎是相同或相似的格式。

You need Seaborn package.您需要Seaborn package。

You mentioned你提到

I would like to be able to get the results of this histogram and filter out the bins that have low values, preserving the high bins .我希望能够获得此直方图的结果并过滤掉具有低值的箱,保留高箱

You should definitely be using one of those:绝对应该使用其中之一:

  1. seaborn.joinplot(...,kind='hex') : it shows the counts of observations that fall within hexagonal bins. seaborn.joinplot(...,kind='hex') :它显示了落在六边形箱内的观察计数。 This plot works best with relatively large dataset.这个 plot 最适用于相对较大的数据集。
  2. seaborn.joinplot(...,kind='kde') : use the kernel density estimation to visualize a bivariate distribution. seaborn.joinplot(...,kind='kde') :使用 kernel 密度估计来可视化二元分布。 I recommed it better.我推荐它更好。

Example 'kde'例子'kde'

Use number of levels n_levels and shade_lowest=False to ignore low values.使用层数n_levelsshade_lowest=False忽略低值。

import seaborn as sns
import numpy as np
import matplotlib.pylab as plt
x, y = np.random.randn(2, 300)
plt.figure(figsize=(6,5))
sns.kdeplot(x, y, zorder=0, n_levels=6, shade=True, cbar=True, 
     shade_lowest=False, cmap='viridis')

在此处输入图像描述

Not sure if this is what you wanted.不确定这是否是您想要的。

The hist2d docs specify that the function returns a tuple of size 4, where the first item h is a heatmap. hist2d 文档指定 function 返回一个大小为 4 的元组,其中第一项h是热图。

This h will have the same shape as bins .这个h将具有与bins相同的形状。

You can capture the output (it will still plot), and use argwhere to find coordinates where values exceed, say, the 90th percentile:您可以捕获 output (它仍会绘图),并使用argwhere查找值超过的坐标,例如,第 90 个百分位数:

h, xedges, yedges, img = hist = plt.hist2d(X, Y, bins=(160,160), norm=mpl.colors.LogNorm(vmin=1, vmax=20))

print(list(np.argwhere(h > np.percentile(h, 90))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM