R-ggplot2-获取两组之间的差异直方图

Question

假设我有两个重叠的直方图。 这是来自ggplot2的可能命令和假装输出图。

ggplot2(data, aes(x=Variable1, fill=BinaryVariable)) + geom_histogram(position="identity")

因此，我所掌握的是每个事件的发生频率或计数。 我想做的是获取每个容器中两个事件之间的差异。 这可能吗？ 怎么样？

例如，如果我们做红色减去蓝色：

x = 2处的值为〜-10
x = 4处的值为〜40-200 = -160
x = 6处的值为〜190-25 = 155
x = 8时的值为〜10

我更愿意使用ggplot2做到这一点，但是另一种方式也可以。 我的数据框设置有类似于此玩具示例的项目（尺寸实际上是25000行x 30列） 编辑：这是与 GIST 一起使用的示例数据

ID   Variable1   BinaryVariable
1     50            T          
2     55            T
3     51            N
..    ..            ..
1000  1001          T
1001  1944          T
1002  1042          N

从示例中可以看到，我对直方图感兴趣，可以为每个BinaryVariable（T或N）分别绘制Variable1（连续变量）。 但是我真正想要的是它们之间的频率差异。

Answer 1

因此，为了做到这一点，我们需要确保我们用于直方图的“ bin”对于指标变量的两个级别都相同。 这是一个比较幼稚的解决方案（在R ）：

df = data.frame(y = c(rnorm(50), rnorm(50, mean = 1)),
                x = rep(c(0,1), each = 50))
#full hist
fullhist = hist(df$y, breaks = 20) #specify more breaks than probably necessary
#create histograms for 0 & 1 using breaks from full histogram
zerohist = with(subset(df, x == 0), hist(y, breaks = fullhist$breaks))
oneshist = with(subset(df, x == 1), hist(y, breaks = fullhist$breaks))
#combine the hists
combhist = fullhist
combhist$counts = zerohist$counts - oneshist$counts
plot(combhist)

因此，我们指定应使用多少个中断（基于完整数据上直方图的值），然后计算每个中断处的计数差异。

PS检查hist()的非图形输出可能会有所帮助。

Answer 2

这是根据要求使用ggplot的解决方案。 关键思想是使用ggplot_build获取由stat_histogram.计算的矩形stat_histogram. geom_rect. ，您可以计算每个仓中的差异，然后使用geom_rect.创建一个新图geom_rect.

设置并使用对数正态数据创建模拟数据集

library(ggplot2)
library(data.table)
theme_set(theme_bw())
n1<-500
n2<-500
k1 <- exp(rnorm(n1,8,0.7))
k2 <- exp(rnorm(n2,10,1))
df <- data.table(k=c(k1,k2),label=c(rep('k1',n1),rep('k2',n2)))

创建第一个情节

p <- ggplot(df, aes(x=k,group=label,color=label)) + geom_histogram(bins=40) + scale_x_log10()

使用`ggplot_build`获取矩形

p_data <- as.data.table(ggplot_build(p)$data[1])[,.(count,xmin,xmax,group)]
p1_data <- p_data[group==1]
p2_data <- p_data[group==2]

加入x坐标以计算差异。请注意，y值不是计数，而是第一个图的y坐标。

newplot_data <- merge(p1_data, p2_data, by=c('xmin','xmax'), suffixes = c('.p1','.p2'))
newplot_data <- newplot_data[,diff:=count.p1 - count.p2]
setnames(newplot_data, old=c('y.p1','y.p2'), new=c('k1','k2'))

df2 <- melt(newplot_data,id.vars =c('xmin','xmax'),measure.vars=c('k1','diff','k2'))

做最后的情节

ggplot(df2, aes(xmin=xmin,xmax=xmax,ymax=value,ymin=0,group=variable,color=variable)) + geom_rect()

当然，比例和图例仍然需要修复，但这是一个不同的主题。

R-ggplot2-获取两组之间的差异直方图

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-03-17 01:29:03

解决方案2
0 2019-09-13 00:31:58

设置并使用对数正态数据创建模拟数据集

创建第一个情节

使用`ggplot_build`获取矩形

加入x坐标以计算差异。请注意，y值不是计数，而是第一个图的y坐标。

做最后的情节

R-ggplot2-获取两组之间的差异直方图

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-03-17 01:29:03

解决方案2 0 2019-09-13 00:31:58

设置并使用对数正态数据创建模拟数据集

创建第一个情节

使用ggplot_build获取矩形

加入x坐标以计算差异。 请注意，y值不是计数，而是第一个图的y坐标。

做最后的情节

解决方案1
3 已采纳 2016-03-17 01:29:03

解决方案2
0 2019-09-13 00:31:58

使用`ggplot_build`获取矩形

加入x坐标以计算差异。请注意，y值不是计数，而是第一个图的y坐标。