简体   繁体   English

R绘制“绘图热图”

[英]R plot 'Heat map' of set of draws

I have a matrix with x rows (ie the number of draws) and y columns (the number of observations). 我有一个矩阵,有x行(即绘制数)和y列(观察数)。 They represent a distribution of y forecasts. 它们代表y预测的分布。

Now I would like to make sort of a 'heat map' of the draws. 现在我想制作抽奖的“热图”。 That is, I want to plot a 'confidence interval' (not really a confidence interval, but just all the values with shading in between), but as a 'heat map' (an example of a heat map ). 也就是说,我想绘制“置信区间”(不是一个真正的置信区间,但都只是在明暗之间的值),但作为一个“热图”(一个示例热图 )。 That means, that if for instance a lot of draws for observation y=y* were around 1 but there was also a draw of 5 for that same observation, that then the area of the confidence interval around 1 is darker (but the whole are between 1 and 5 is still shaded). 这意味着,如果例如观察y = y *的大量抽取大约为1但是同一观察也有5的抽取,则那么1周围的置信区间的区域更暗(但整体是1到5之间仍然是阴影)。

To be totally clear: I like for instance the plot in the answer here , but then I would want the grey confidence interval to instead be colored as intensities (ie some areas are darker). 要完全清楚:我喜欢这里的答案中的情节,但是我希望灰色置信区间被改为强度(即某些区域更暗)。

Could someone please tell me how I could achieve that? 有人可以告诉我如何实现这一目标吗?

Thanks in advance. 提前致谢。

Edit: As per request: example data. 编辑:根据请求:示例数据。 Example of the first 20 values of the first column (ie y[1:20,1]): 第一列的前20个值的示例(即y [1:20,1]):

[1]  0.032067416 -0.064797792  0.035022338  0.016347263  0.034373065 
0.024793101 -0.002514447  0.091411355 -0.064263536 -0.026808208 [11]  0.125831185 -0.039428744  0.017156454 -0.061574540 -0.074207109 -0.029171227  0.018906181  0.092816957  0.028899699 -0.004535961

So, the hard part of this is transforming your data into the right shape, which is why it's nice to share something that really looks like your data, not just a single column. 因此,这很难将您的数据转换为正确的形状,这就是为什么分享真正看起来像您的数据的东西,而不仅仅是单个列的原因。

Let's say your data is this a matrix with 10,000 rows and 10 columns. 假设您的数据是一个包含10,000行和10列的矩阵。 I'll just use a uniform distribution so it will be a boring plot at the end 我只会使用统一分布,所以最后会是一个无聊的情节

n = 10000
k = 10
mat = matrix(runif(n * k), nrow = n)

Next, we'll calculate quantiles for each column using apply , transpose, and make it a data frame: 接下来,我们将使用apply ,transpose计算每列的分位数,并使其成为数据框:

dat = as.data.frame(t(apply(mat, MARGIN = 2, FUN = quantile, probs = seq(.1, 0.9, 0.1))))

Add an x variable (since we transposed, each x value corresponds to a column in the original data) 添加一个x变量(因为我们转置,每个x值对应于原始数据中的一列)

dat$x = 1:nrow(dat)

We now need to get it into a "long" form, grouped by the min and max values for a certain deviation group around the median, and of course get rid of the pesky percent signs introduced by quantile : 我们现在需要将它变成一个“长”形式,按照中位数周围某个偏差组的最小值和最大值分组,当然要摆脱quantile引入的讨厌的百分号:

library(dplyr)
library(tidyr)
dat_long = gather(dat, "quantile", value = "y", -x) %>%
    mutate(quantile = as.numeric(gsub("%", "", quantile)),
           group = abs(50 - quantile))

dat_ribbon = dat_long %>% filter(quantile < 50) %>%
    mutate(ymin = y) %>%
    select(x, ymin, group) %>%
    left_join(
        dat_long %>% filter(quantile > 50) %>%
        mutate(ymax = y) %>%
        select(x, ymax, group)
    )

dat_median = filter(dat_long, quantile == 50)

And finally we can plot. 最后我们可以策划。 We'll plot a transparent ribbon for each "group", that is 10%-90% interval, 20%-80% interval, ... 40%-60% interval, and then a single line at the median (50%). 我们将为每个“组”绘制一条透明色带,即间隔为10%-90%,间隔为20%-80%,间隔为40%-60%,然后是中间的一条线(50%) )。 Using transparency, the middle will be darker as it has more ribbons overlapping on top of it. 使用透明度,中间会更暗,因为它上面有更多的丝带重叠。 This doesn't go from the mininum to the maximum, but it will if you set the probs in the quantile call to go from 0 to 1 instead of .1 to .9. 这不是从最小值到最大值,但是如果你将quantile调用中的probs设置为0到1而不是.1到.9。

library(ggplot2)
ggplot(dat_ribbon, aes(x = x)) +
    geom_ribbon(aes(ymin = ymin, ymax = ymax, group = group), alpha = 0.2) +
    geom_line(aes(y = y), data = dat_median, color = "white")

在此输入图像描述

Worth noting that this is not a conventional heatmap. 值得注意的是,这不是传统的热图。 A heatmap usually implies that you have 3 variables, x, y, and z (color), where there is a z-value for every xy pair. 热图通常意味着您有3个变量,x,y和z(颜色),其中每个xy对都有一个z值。 Here you have two variables, x and y, with y depending on x. 这里有两个变量,x和y,y取决于x。

That is not a lot to go on, but I would probably start with the hexbin or hexbinplot package. 这不是很多,但我可能会从hexbinhexbinplot包开始。 Several alternatives are presented in this SO post. 本SO帖子中提供了几种替代方案。

Formatting and manipulating a plot from the R package "hexbin" 格式化和操作R包“hexbin”中的绘图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM