简体   繁体   English

如何绘制两个ggplot密度分布之间的差异?

[英]How to plot the difference between two ggplot density distributions?

I would like to use ggplot2 to illustrate the difference between two similar density distributions. 我想用ggplot2来说明两个相似密度分布之间的区别。 Here is a toy example of the type of data I have: 这是我所拥有的数据类型的玩具示例:

library(ggplot2)

# Make toy data
n_sp  <- 100000
n_dup <- 50000
D <- data.frame( 
    event=c(rep("sp", n_sp), rep("dup", n_dup) ), 
    q=c(rnorm(n_sp, mean=2.0), rnorm(n_dup, mean=2.1)) 
)

# Standard density plot
ggplot( D, aes( x=q, y=..density.., col=event ) ) +
    geom_freqpoly()

Rather than separately plot the density for each category ( dup and sp ) as above, how could I plot a single line that shows the difference between these distributions? 如上所示,不是单独绘制每个类别( dupsp )的密度,而是如何绘制显示这些分布之间差异的单行?

In the toy example above, if I subtracted the dup density distribution from the sp density distribution, the resulting line would be above zero on the left side of the plot (since there is an abundance of smaller sp values) and below 0 on the right (since there is an abundance of larger dup values). 在上面的玩具例如,如果我减去dup从密度分布sp密度分布,所得到的线将是零以上关于图的左侧(由于存在较小的丰度sp值)和低于0在右边(因为有大量更大的dup值)。 Not that there may be a different number of observations of type dup and sp . 并不是说dupsp类型的观察数量可能不同。

More generally - what is the best way to show differences between similar density distributions? 更一般地说 - 显示相似密度分布之间差异的最佳方法是什么?

There may be a way to do this within ggplot, but frequently it's easiest to do the calculations beforehand. 可能有一种方法可以在ggplot中执行此操作,但通常最容易事先进行计算。 In this case, call density on each subset of q over the same range, then subtract the y values. 在这种情况下,在相同范围内对q每个子集调用density ,然后减去y值。 Using dplyr (translate to base R or data.table if you wish), 使用dplyr(如果你愿意,转换为基数R或data.table),

library(dplyr)
library(ggplot2)

D %>% group_by(event) %>% 
    # calculate densities for each group over same range; store in list column
    summarise(d = list(density(q, from = min(.$q), to = max(.$q)))) %>% 
    # make a new data.frame from two density objects
    do(data.frame(x = .$d[[1]]$x,    # grab one set of x values (which are the same)
                  y = .$d[[1]]$y - .$d[[2]]$y)) %>%    # and subtract the y values
    ggplot(aes(x, y)) +    # now plot
    geom_line()

减去密度的图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM