简体   繁体   English

plot 如何在二维直方图中分配第三个变量?

[英]How to plot the distribution of a third variable in a 2d histogram?

Imagine you have a data set in three dimensions, x , y and z , and you want to show their relation.假设您有一个三个维度的数据集xyz ,并且您想要显示它们之间的关系。 You could do this for example using a scatter plot in x and y and adding information about z with the help of a colormap:例如,您可以在xy中使用散点图 plot 并借助颜色图添加有关z的信息来执行此操作:

在此处输入图像描述

But such a plot can be hard to read or even missleading, so I would like to use a 2d-histogram in x and y instead and weigh each data point by their z value:但是这样的 plot 可能难以阅读甚至误导,所以我想在xy中使用二维直方图,并通过它们的z值权衡每个数据点:

在此处输入图像描述

However, as can be seen by the plot above, the magnitude of bin values can now be much higher than the maximum in z , which makes sense of course, as the bin values are usually the sums several z values.然而,从上面的 plot 可以看出,bin 值的大小现在可以远高于z中的最大值,这当然是有道理的,因为 bin 值通常是几个z值的总和。

So weighing by their z value is not enough, I also need to "normalize" each bin value by the number of data points within it.因此,仅凭z值称重是不够的,我还需要根据其中的数据点数“规范化”每个 bin 值。 But as can be seen on the right plot above, for some reason, this doesn't seem to work.但是从上图右边的plot可以看出,不知为什么,这个好像不行。 The color value range remains unchanged.颜色值范围保持不变。

What am I doing wrong and is there a better approach to do this?我做错了什么,有没有更好的方法来做到这一点?

Code for reproduction (loosely based on this example ):复制代码(大致基于此示例):

import matplotlib.pyplot as plt
import numpy as np


# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)


fig, ax = plt.subplots(figsize=(4, 3), constrained_layout=True)
data = ax.scatter(x, y, c=z, s=10)
fig.colorbar(data, ax=ax, label='z')
ax.set(xlabel='x', ylabel='y', title='scatter')
fig.show()

bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')

_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')

_, _, _, img = axs[2].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
data = img.get_array().reshape((bins, bins))
hist, _, _ = np.histogram2d(x, y, bins=bins)
mask = hist > 0
data[mask] = data[mask]/hist[mask]
img.set_array(data)
img.update_scalarmappable()
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()

Implementing the solution from this very similar post , I managed to make it work, however I'm still not sure why my original approach didn't work.这个非常相似的帖子中实施解决方案,我设法让它工作,但我仍然不确定为什么我原来的方法不起作用。

import matplotlib.pyplot as plt
import numpy as np


# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)

bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')

_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')

sums, xbins, ybins = np.histogram2d(x, y, bins=bins, weights=z)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'):  # suppress possible divide-by-zero warnings
    img = axs[2].pcolormesh(xbins, ybins, sums / counts)
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM