[英]Draw vertical quantile lines over histogram
I currently generate the following plot using ggplot in R: 我目前在R中使用ggplot生成以下图:
The data is stored in a single dataframe with three columns: PDF (y-axis in the plot above), mids(x) and dataset name. 数据存储在一个包含三列的数据框中:PDF(上图中的y轴),mids(x)和数据集名称。 This is created from histograms.
这是从直方图创建的。
What I want to do is to plot a color-coded vertical line for each dataset representing the 95th quantile, like I manually painted below as an example: 我想要做的是为表示第95个分位数的每个数据集绘制一个颜色编码的垂直线,就像我在下面手动绘制的一样:
I tried to use + geom_line(stat="vline", xintercept="mean")
but of course I'm looking for the quantiles, not for the mean, and AFAIK ggplot does not allow that. 我尝试使用
+ geom_line(stat="vline", xintercept="mean")
但当然我正在寻找分位数,而不是平均值,而AFAIK ggplot不允许这样做。 Colors are fine. 颜色很好。
I also tried + stat_quantile(quantiles = 0.95)
but I'm not sure what it does exactly. 我也试过
+ stat_quantile(quantiles = 0.95)
但我不确定它到底是做什么的。 Documentation is very scarce. 文档非常稀缺。 Colors, again, are fine.
再次,颜色很好。
Please note that density values are very low, down to 1e-8. 请注意,密度值非常低,低至1e-8。 I don't know if the quantile() function likes that.
我不知道quantile()函数是否喜欢它。
I understand that calculating the quantile of an histogram is not quite the same as calculating that of a list of numbers. 我知道计算直方图的分位数与计算数字列表的分位数并不完全相同。 I don't know how it would help, but the
HistogramTools
package contains an ApproxQuantile()
function for histogram quantiles. 我不知道它会有什么帮助,但
HistogramTools
包中包含一个用于直方图分位数的ApproxQuantile()
函数。
Minimum working example is included below. 最低工作示例如下。 As you can see I obtain a data frame from each histogram, then bind the dataframes together and plot that.
如您所见,我从每个直方图中获取数据帧,然后将数据帧绑定在一起并绘制出来。
library(ggplot2)
v <- c(1:30, 2:50, 1:20, 1:5, 1:100, 1, 2, 1, 1:5, 0, 0, 0, 5, 1, 3, 7, 24, 77)
h <- hist(v, breaks=c(0:100))
df1 <- data.frame(h$mids,h$density,rep("dataset1", 100))
colnames(df1) <- c('Bin','Pdf','Dataset')
df2 <- data.frame(h$mids*2,h$density*2,rep("dataset2", 100))
colnames(df2) <- c('Bin','Pdf','Dataset')
df_tot <- rbind(df1, df2)
ggplot(data=df_tot[which(df_tot$Pdf>0),], aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5)
Precomputing these values and plotting them separately seems like the simplest option. 预先计算这些值并分别绘制它们似乎是最简单的选择。 Doing so with
dplyr
requires minimal effort: 使用
dplyr
执行此操作dplyr
需要很少的工作量:
library(dplyr)
q.95 <- df_tot %>%
group_by(Dataset) %>%
summarise(Bin_q.95 = quantile(Bin, 0.95))
ggplot(data=df_tot[which(df_tot$Pdf>0),],
aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5) +
geom_vline(data = q.95, aes(xintercept = Bin_q.95, colour = Dataset))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.