[英]How to plot the distribution of a third variable in a 2d histogram?
Imagine you have a data set in three dimensions, x
, y
and z
, and you want to show their relation.假设您有一个三个维度的数据集
x
、 y
和z
,并且您想要显示它们之间的关系。 You could do this for example using a scatter plot in x
and y
and adding information about z
with the help of a colormap:例如,您可以在
x
和y
中使用散点图 plot 并借助颜色图添加有关z
的信息来执行此操作:
But such a plot can be hard to read or even missleading, so I would like to use a 2d-histogram in x
and y
instead and weigh each data point by their z
value:但是这样的 plot 可能难以阅读甚至误导,所以我想在
x
和y
中使用二维直方图,并通过它们的z
值权衡每个数据点:
However, as can be seen by the plot above, the magnitude of bin values can now be much higher than the maximum in z
, which makes sense of course, as the bin values are usually the sums several z
values.然而,从上面的 plot 可以看出,bin 值的大小现在可以远高于
z
中的最大值,这当然是有道理的,因为 bin 值通常是几个z
值的总和。
So weighing by their z
value is not enough, I also need to "normalize" each bin value by the number of data points within it.因此,仅凭
z
值称重是不够的,我还需要根据其中的数据点数“规范化”每个 bin 值。 But as can be seen on the right plot above, for some reason, this doesn't seem to work.但是从上图右边的plot可以看出,不知为什么,这个好像不行。 The color value range remains unchanged.
颜色值范围保持不变。
What am I doing wrong and is there a better approach to do this?我做错了什么,有没有更好的方法来做到这一点?
Code for reproduction (loosely based on this example ):复制代码(大致基于此示例):
import matplotlib.pyplot as plt
import numpy as np
# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)
fig, ax = plt.subplots(figsize=(4, 3), constrained_layout=True)
data = ax.scatter(x, y, c=z, s=10)
fig.colorbar(data, ax=ax, label='z')
ax.set(xlabel='x', ylabel='y', title='scatter')
fig.show()
bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')
_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')
_, _, _, img = axs[2].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
data = img.get_array().reshape((bins, bins))
hist, _, _ = np.histogram2d(x, y, bins=bins)
mask = hist > 0
data[mask] = data[mask]/hist[mask]
img.set_array(data)
img.update_scalarmappable()
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()
Implementing the solution from this very similar post , I managed to make it work, however I'm still not sure why my original approach didn't work.从这个非常相似的帖子中实施解决方案,我设法让它工作,但我仍然不确定为什么我原来的方法不起作用。
import matplotlib.pyplot as plt
import numpy as np
# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)
bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')
_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')
sums, xbins, ybins = np.histogram2d(x, y, bins=bins, weights=z)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'): # suppress possible divide-by-zero warnings
img = axs[2].pcolormesh(xbins, ybins, sums / counts)
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.