简体   繁体   English

在ggplot2中使用geom_stat / geom_smooth时,查找置信区间上下的点

[英]Find points over and under the confidence interval when using geom_stat / geom_smooth in ggplot2

I have a scatter plot,I want to know how can I find the genes above and below the confidence interval lines? 我有一个散点图,我想知道如何在置信区间线上方和下方找到基因?

在此输入图像描述


EDIT: Reproducible example: 编辑:可重复的例子:

library(ggplot2)
#dummy data
df <- mtcars[,c("mpg","cyl")]

#plot
ggplot(df,aes(mpg,cyl)) +
  geom_point() +
  geom_smooth()

在此输入图像描述

This solution takes advantage of the hard work ggplot2 does for you: 这个解决方案利用了ggplot2为您做的辛勤工作:

library(sp)

# we have to build the plot first so ggplot can do the calculations
ggplot(df,aes(mpg,cyl)) +
  geom_point() +
  geom_smooth() -> gg

# do the calculations
gb <- ggplot_build(gg)

# get the CI data
p <- gb$data[[2]]

# make a polygon out of it
poly <- data.frame(
  x=c(p$x[1],    p$x,    p$x[length(p$x)],    rev(p$x)), 
  y=c(p$ymax[1], p$ymin, p$ymax[length(p$x)], rev(p$ymax))
)

# test for original values in said polygon and add that to orig data
# so we can color by it
df$in_ci <- point.in.polygon(df$mpg, df$cyl, poly$x, poly$y)

# re-do the plot with the new data
ggplot(df,aes(mpg,cyl)) +
  geom_point(aes(color=factor(in_ci))) +
  geom_smooth()

在此输入图像描述

It needs a bit of tweaking (ie that last point getting a 2 value) but I'm limited on time. 它需要一些调整(即最后一点获得2值),但我的时间有限。 NOTE that the point.in.polygon return values are: 请注意, point.in.polygon返回值为:

  • 0 : point is strictly exterior to pol 0 :点是pol的外部
  • 1 : point is strictly interior to pol 1 :点是pol的内部
  • 2 : point lies on the relative interior of an edge of pol 2 :点位于pol边缘的相对内部
  • 3 : point is a vertex of pol 3 :点是pol的顶点

so it should be easy to just change the code to TRUE / FALSE whether value is 0 or not. 所以将代码更改为TRUE / FALSE应该很容易,无论值是否为0

I had to take a deep dive into the github repo but I finally got it. 我不得不深入了解github回购,但我终于得到了它。 In order to do this you need to know how stat_smooth works. 为此,您需要了解stat_smooth工作原理。 In this specific case the loess function is called to do the smoothing (the different smoothing functions can be constructed using the same process as below): 在这种特定情况下,调用loess函数进行平滑(可以使用与下面相同的过程构造不同的平滑函数):

So, using loess on this occasion we would do: 所以,在这个场合使用loess ,我们会这样做:

#data
df <- mtcars[,c("mpg","cyl"), with=FALSE]
#run loess model
cars.lo <- loess(cyl ~ mpg, df)

Then I had to read this in order to see how the predictions are made internally in stat_smooth . 然后我必须阅读这个 ,以便了解如何在stat_smooth内部进行stat_smooth Apparently hadley uses the predictdf function (which is not exported to the namespace) as follows for our case: 显然,hadley使用predictdf函数(未导出到命名空间),如下所示:

predictdf.loess <- function(model, xseq, se, level) {
  pred <- stats::predict(model, newdata = data.frame(x = xseq), se = se)

  if (se) {
    y = pred$fit
    ci <- pred$se.fit * stats::qt(level / 2 + .5, pred$df)
    ymin = y - ci
    ymax = y + ci
    data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)
  } else {
    data.frame(x = xseq, y = as.vector(pred))
  }
}

After reading the above I was able to create my own data.frame of the predictions using: 阅读完上述内容后,我可以使用以下方法创建自己的数据预测框架:

#get the predictions i.e. the fit and se.fit vectors
pred <- predict(cars.lo, se=TRUE)
#create a data.frame from those
df2 <- data.frame(mpg=df$mpg, fit=pred$fit, se.fit=pred$se.fit * qt(0.95 / 2 + .5, pred$df))

Looking at predictdf.loess we can see that the upper boundary of the confidence interval is created as pred$fit + pred$se.fit * qt(0.95 / 2 + .5, pred$df) and the lower boundary as pred$fit - pred$se.fit * qt(0.95 / 2 + .5, pred$df) . 看看predictdf.loess我们可以看到置信区间的上边界被创建为pred$fit + pred$se.fit * qt(0.95 / 2 + .5, pred$df) ,下边界为pred$fit - pred$se.fit * qt(0.95 / 2 + .5, pred$df)

Using those we can create a flag for the points over or below those boundaries: 使用那些我们可以为这些边界之上或之下的点创建一个标志:

#make the flag
outerpoints <- +(df$cyl > df2$fit + df2$se.fit |  df$cyl < df2$fit - df2$se.fit)
#add flag to original data frame
df$outer <- outerpoints

The df$outer column is probably what the OP is looking for (it takes the value of 1 if it is outside the boundaries or 0 otherwise) but just for the sake of it I am plotting it below. df$outer列可能是OP正在查找的内容(如果它在边界之外则取值1,否则为0)但仅仅是为了它我正在下面绘制它。

Notice the + function above is only used here to convert the logical flag into a numeric. 请注意,上面的+函数仅用于将逻辑标志转换为数字。

Now if we plot as this: 现在,如果我们绘制如下:

ggplot(df,aes(mpg,cyl)) +
  geom_point(aes(colour=factor(outer))) +
  geom_smooth() 

We can actually see the points inside and outside the confidence interval. 我们实际上可以看到置信区间内外的点。

Output: 输出:

在此输入图像描述

PS For anyone who is interested in the upper and lower boundaries, they are created like this (speculation: although the shaded areas are probably created with geom_ribbon - or something similar - which makes them more round and pretty): PS对于那些对上下边界感兴趣的人,他们是这样创造的(推测:虽然阴影区域可能是用geom_ribbon创建的 - 或类似的东西 - 这使得它们更圆而且漂亮):

#upper boundary
ggplot(df,aes(mpg,cyl)) +
   geom_point(aes(colour=factor(outer))) +
   geom_smooth() +
   geom_line(data=df2, aes(mpg , fit + se.fit , group=1), colour='red')

#lower boundary
ggplot(df,aes(mpg,cyl)) +
   geom_point(aes(colour=factor(outer))) +
   geom_smooth() +
   geom_line(data=df2, aes(mpg , fit - se.fit , group=1), colour='red')

Using ggplot_build like @hrbrmstr's nice solution, you can actually do this by simply passing a sequence of x values to geom_smooth specifying where the errors bounds should be calculated, and make this equal to the x-values of your points. 使用ggplot_build就像@ hrbrmstr这个不错的解决方案一样,你可以通过简单地将一系列x值传递给geom_smooth指定应该计算错误界限的位置来实现这一点,并使其等于你的点的x值。 Then, you just see if the y-values are within the range. 然后,您只需查看y值是否在范围内。

library(ggplot2)

## dummy data
df <- mtcars[,c("mpg","cyl")]

ggplot(df, aes(mpg, cyl)) +
  geom_smooth(params=list(xseq=df$mpg)) -> gg

## Find the points within bounds
bounds <- ggplot_build(gg)[[1]][[1]]
df$inside <- with(df, bounds$ymin < cyl & bounds$ymax > cyl)

## Add the points
gg + geom_point(data=df, aes(color=inside)) + theme_bw()

在此输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R : 使用 ggplot2 部分显示的置信区间(使用 geom_smooth()) - R : confidence interval being partially displayed with ggplot2 (using geom_smooth()) 在ggplot2中用geom_smooth指定密度 - Points density with geom_smooth in ggplot2 在 ggplot2 中,指定用于 geom_smooth() 置信区间的值(类似于 geom_errorbar) - In ggplot2,specify values to use for geom_smooth() confidence interval (similar to geom_errorbar) geom_smooth,stat_smooth置信区间不起作用? - geom_smooth, stat_smooth confidence interval not working? ggplot2:如何在 geom_smooth 中获得可靠的预测置信区间? - ggplot2: how to get robust confidence interval for predictions in geom_smooth? 使用 ggplot2::geom_smooth() 时如何显示整数 - How to show integers when using ggplot2::geom_smooth() 在 ggplot2 中,使用现有 CI 变量指定 geom_smooth(或任何趋势线)周围的置信区间 (95% CI) - In ggplot2, specify a confidence interval (95% CI) around geom_smooth (or any trend line) using existing CI variables 使用ggplot geom_smooth使多条平滑线相对于置信区间填充更可见 - Make multiple smoothed lines more visible in relation to confidence interval fills using ggplot geom_smooth 使用 ggplotly 删除 geom_smooth 置信区间上的边界线 - Remove border lines on geom_smooth confidence interval using ggplotly 在ggplot2中使用geom_smooth()和geom_point()时显示带有日期的x轴标签 - Display x-axis labels with dates when using geom_smooth() and geom_point() in ggplot2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM