简体   繁体   English

在直方图上绘制垂直分位线

[英]Draw vertical quantile lines over histogram

I currently generate the following plot using ggplot in R: 我目前在R中使用ggplot生成以下图:

The data is stored in a single dataframe with three columns: PDF (y-axis in the plot above), mids(x) and dataset name. 数据存储在一个包含三列的数据框中:PDF(上图中的y轴),mids(x)和数据集名称。 This is created from histograms. 这是从直方图创建的。
What I want to do is to plot a color-coded vertical line for each dataset representing the 95th quantile, like I manually painted below as an example: 我想要做的是为表示第95个分位数的每个数据集绘制一个颜色编码的垂直线,就像我在下面手动绘制的一样:

I tried to use + geom_line(stat="vline", xintercept="mean") but of course I'm looking for the quantiles, not for the mean, and AFAIK ggplot does not allow that. 我尝试使用+ geom_line(stat="vline", xintercept="mean")但当然我正在寻找分位数,而不是平均值,而AFAIK ggplot不允许这样做。 Colors are fine. 颜色很好。
I also tried + stat_quantile(quantiles = 0.95) but I'm not sure what it does exactly. 我也试过+ stat_quantile(quantiles = 0.95)但我不确定它到底是做什么的。 Documentation is very scarce. 文档非常稀缺。 Colors, again, are fine. 再次,颜色很好。

Please note that density values are very low, down to 1e-8. 请注意,密度值非常低,低至1e-8。 I don't know if the quantile() function likes that. 我不知道quantile()函数是否喜欢它。

I understand that calculating the quantile of an histogram is not quite the same as calculating that of a list of numbers. 我知道计算直方图的分位数与计算数字列表的分位数并不完全相同。 I don't know how it would help, but the HistogramTools package contains an ApproxQuantile() function for histogram quantiles. 我不知道它会有什么帮助,但HistogramTools包中包含一个用于直方图分位数的ApproxQuantile()函数。

Minimum working example is included below. 最低工作示例如下。 As you can see I obtain a data frame from each histogram, then bind the dataframes together and plot that. 如您所见,我从每个直方图中获取数据帧,然后将数据帧绑定在一起并绘制出来。

library(ggplot2)
v <- c(1:30, 2:50, 1:20, 1:5, 1:100, 1, 2, 1, 1:5, 0, 0, 0, 5, 1, 3, 7, 24, 77)
h <- hist(v, breaks=c(0:100))
df1 <- data.frame(h$mids,h$density,rep("dataset1", 100))
colnames(df1) <- c('Bin','Pdf','Dataset')
df2 <- data.frame(h$mids*2,h$density*2,rep("dataset2", 100))
colnames(df2) <- c('Bin','Pdf','Dataset')
df_tot <- rbind(df1, df2)

ggplot(data=df_tot[which(df_tot$Pdf>0),], aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5)

Precomputing these values and plotting them separately seems like the simplest option. 预先计算这些值并分别绘制它们似乎是最简单的选择。 Doing so with dplyr requires minimal effort: 使用dplyr执行此操作dplyr需要很少的工作量:

library(dplyr)
q.95 <- df_tot %>%
  group_by(Dataset) %>%
  summarise(Bin_q.95 = quantile(Bin, 0.95))

ggplot(data=df_tot[which(df_tot$Pdf>0),], 
       aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
  geom_point(aes(color=Dataset), alpha = 0.7, size=1.5) + 
  geom_vline(data = q.95, aes(xintercept = Bin_q.95, colour = Dataset))

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM