简体   繁体   English

我们能否将回归方程与 R2 和 p 值巧妙地对齐?

[英]Can we neatly align the regression equation and R2 and p value?

What is the best (easiest) approach to add neatly to a ggplot plot the regression equation, the R2, and the p-value (for the equation)?将回归方程、R2 和 p 值(对于方程)巧妙地添加到ggplot plot 的最佳(最简单)方法是什么? Ideally it should be compatible with groups and faceting.理想情况下,它应该与组和分面兼容。

This first plot with has the regression equation plus the r2 and p-value by group using ggpubr , but they are not aligned?这第一个 plot 具有回归方程加上使用ggpubr分组的 r2 和 p 值,但它们没有对齐? Am I missing something?我错过了什么吗? Could they be included as one string?它们可以作为一个字符串包含吗?

library(ggplot)
library(ggpubr)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_regline_equation()+
  stat_cor(aes(label = paste(..rr.label.., ..p.label.., sep = "*`,`~")),
           label.x.npc = "centre")

情节1

Here is an option with ggpmisc , that does some odd placement.这是ggpmisc的一个选项,它做了一些奇怪的放置。
EDIT Odd placement was caused by geom=text , which I've commented out to provide better placement, and added `label.x = "right" to stop overplotting.编辑奇怪的位置是由geom=text引起的,我已将其注释掉以提供更好的位置,并添加了 `label.x = "right" 以停止过度绘制。 We still have misalignemnt as per ggpubr , due to the superscript issue flagged by @dc37由于@dc37 标记的上标问题,我们仍然有根据ggpubr的错位

#https://stackoverflow.com/a/37708832/4927395
library(ggpmisc)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = "y~x", 
             aes(label = paste(..eq.label.., ..rr.label.., sep = "*`,`~")), 
             parse = TRUE)+
  stat_fit_glance(method = 'lm',
                  method.args = list(formula = "y~x"),
                  #geom = 'text',

                  aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")))

plot2_edited

I did find a good solution for bringing the relevant stats together, but that requires creating the regression outside ggplot, and a pile of string manipulation fluff - is this as easy as it gets?我确实找到了一个很好的解决方案来将相关的统计数据整合在一起,但这需要在 ggplot 之外创建回归,以及一堆字符串操作绒毛——这是否很容易? Also, it doesn't (as currently coded) deal to the grouping, and wouldn't deal with facetting.此外,它不(按当前编码)处理分组,也不处理分面。

#https://stackoverflow.com/a/51974753/4927395
#Solution as one string, equation, R2 and p-value
lm_eqn <- function(df, y, x){
  formula = as.formula(sprintf('%s ~ %s', y, x))
  m <- lm(formula, data=df);
  # formating the values into a summary string to print out
  # ~ give some space, but equal size and comma need to be quoted
  eq <- substitute(italic(target) == a + b %.% italic(input)*","~~italic(r)^2~"="~r2*","~~p~"="~italic(pvalue), 
                   list(target = y,
                        input = x,
                        a = format(as.vector(coef(m)[1]), digits = 2), 
                        b = format(as.vector(coef(m)[2]), digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3),
                        # getting the pvalue is painful
                        pvalue = format(summary(m)$coefficients[2,'Pr(>|t|)'], digits=1)
                   )
  )
  as.character(as.expression(eq));                 
}

ggplot(mtcars, aes(x = wt, y = mpg, group=cyl))+
  geom_point() +
  geom_text(x=3,y=30,label=lm_eqn(mtcars, 'wt','mpg'),color='red',parse=T) +
  geom_smooth(method='lm')

在此处输入图像描述

I have updated 'ggpmisc' to make this easy.我已经更新了 'ggpmisc' 以简化此操作。 Version 0.3.4 is now on its way to CRAN, source package is on-line, binaries should be built in a few days' time.版本 0.3.4 现在正在向 CRAN 发送,源代码 package 已上线,应在几天内构建二进制文件。

library(ggpmisc) # version >= 0.3.4 !!

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl)) +
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = y ~ x, 
               aes(label = paste(..eq.label.., ..rr.label.., ..p.value.label.., sep = "*`,`~")), 
               parse = TRUE,
               label.x.npc = "right",
               vstep = 0.05) # sets vertical spacing

在此处输入图像描述

A possible solution with ggpubr is to place your equation formula and R2 values on top of the graph by passing Inf to label.y and Inf or -Inf to label.x (depending if you want it on the right or left side of the plot) ggpubr的一个可能解决方案是通过将Inf传递给label.y并将Inf或 -Inf 传递给-Inf将方程公式和 R2 值放在图的label.x (取决于您希望它在图的右侧还是左侧)

Both text won't aligned because of the superscript 2 on R.由于 R 上的上标 2,这两个文本都不会对齐。 So, you will have to tweak it a little bit by using vjust and hjust in order to align both texts.因此,您必须使用vjusthjust稍微调整一下,以便对齐两个文本。

Then, it will work even with facetted graphs with different scales.然后,它甚至适用于具有不同比例的多面图。

library(ggplot)
library(ggpubr)

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_regline_equation(label.x = -Inf, label.y = Inf, vjust = 1.5, hjust = -0.1, size = 3)+
  stat_cor(aes(label = paste(..rr.label.., ..p.label.., sep = "*`,`~")),
           label.y= Inf, label.x = Inf, vjust = 1, hjust = 1.1, size = 3)+
  facet_wrap(~cyl, scales = "free")

在此处输入图像描述

Does it answer your question?它回答了你的问题吗?


EDIT: Alternative by manually adding the equation编辑:通过手动添加等式替代

As described in your similar question ( Label ggplot groups using equation with ggpmisc ), you can add your equation by passing the text as geom_text :如您的类似问题( Label ggplot groups using equation with ggpmisc )中所述,您可以通过将文本作为geom_text传递来添加您的方程:

df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))

df_label <- df_mtcars %>% group_by(factor_cyl) %>%
  summarise(Inter = lm(mpg~wt)$coefficients[1],
            Coeff = lm(mpg~wt)$coefficients[2],
            pval = summary(lm(mpg~wt))$coefficients[2,4],
            r2 = summary(lm(mpg~wt))$r.squared) %>% ungroup() %>%
  #mutate(ypos = max(df_mtcars$mpg)*(1-0.05*row_number())) %>%
  #mutate(Label2 = paste(factor_cyl,"~Cylinders:~", "italic(y)==",round(Inter,3),ifelse(Coeff <0,"-","+"),round(abs(Coeff),3),"~italic(x)",sep ="")) %>%
  mutate(Label = paste("italic(y)==",round(Inter,3),ifelse(Coeff <0,"-","+"),round(abs(Coeff),3),"~italic(x)",
                       "~~~~italic(R^2)==",round(r2,3),"~~italic(p)==",round(pval,3),sep =""))

# A tibble: 3 x 6
  factor_cyl Inter Coeff   pval    r2 Label                                                                    
  <fct>      <dbl> <dbl>  <dbl> <dbl> <chr>                                                                    
1 4           39.6 -5.65 0.0137 0.509 italic(y)==39.571-5.647~italic(x)~~~~italic(R^2)==0.509~~italic(p)==0.014
2 6           28.4 -2.78 0.0918 0.465 italic(y)==28.409-2.78~italic(x)~~~~italic(R^2)==0.465~~italic(p)==0.092 
3 8           23.9 -2.19 0.0118 0.423 italic(y)==23.868-2.192~italic(x)~~~~italic(R^2)==0.423~~italic(p)==0.012

And you can use it for geom_text as follow:您可以将它用于geom_text ,如下所示:

ggplot(df_mtcars,aes(x = wt, y = mpg, group = factor_cyl, colour= factor_cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  geom_text(data = df_label,
            aes(x = -Inf, y = Inf, 
                label = Label, color = factor_cyl), 
          show.legend = FALSE, parse = TRUE, size = 3,vjust = 1, hjust = 0)+
  facet_wrap(~factor_cyl)

在此处输入图像描述

At least, it solves the issue of the mis-alignement due to the superscript 2 on R.至少,它解决了R上的上标2导致的错位问题。

Here I use ggpmisc, with one call to stat_poly_eq() for the equation (centre top), and one call to stat_fit_glance() for the stats (pvalue and r2).这里我使用 ggpmisc,对等式(中心顶部)调用一次stat_poly_eq() stat_fit_glance() ,对统计数据(pvalue 和 r2)调用一次 stat_fit_glance()。 The secret sauce for the alignment is using yhat as the left hand side for the equation, as the hat approximates the text height that then matches the superscript for the r2 - hat tip to Pedro Aphalo for the yhat, shown here . alignment 的秘诀是使用 yhat 作为等式的左侧,因为帽子近似于文本高度,然后匹配 r2 帽尖的上标到 Pedro Aphalo 的 yhat,如下所示

Would be great to have them as one string, which means horizontal alignment would not be a problem, and then locating it conveniently in the plot space would be easier.将它们作为一个字符串会很棒,这意味着水平 alignment 不会有问题,然后在 plot 空间中方便地定位它会更容易。 I've raised as issues at ggpubr and ggpmisc .我在ggpubrggpmisc提出了问题。

I'll happily accept another better answer!我很乐意接受另一个更好的答案!

library(ggpmisc)

df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))

my_formula <- "y~x"

ggplot(df_mtcars, aes(x = wt, y = mpg, group = factor_cyl, colour= factor_cyl))+
  geom_smooth(method="lm")+
  geom_point()+
  stat_poly_eq(formula = my_formula,
               label.x = "centre",
               eq.with.lhs = "italic(hat(y))~`=`~",
               aes(label = paste(..eq.label.., sep = "~~~")), 
               parse = TRUE)+
  stat_fit_glance(method = 'lm',
                  method.args = list(formula = my_formula),
                  #geom = 'text',
                  label.x = "right", #added to prevent overplotting
                  aes(label = paste("~italic(p) ==", round(..p.value.., digits = 3),
                                    "~italic(R)^2 ==", round(..r.squared.., digits = 2),
                                    sep = "~")),
                  parse=TRUE)+
  theme_minimal()

绘图结果

Note facet also works neatly, and you could have different variables for the facet and grouping and everything still works.注意方面也可以很好地工作,你可以为方面和分组设置不同的变量,一切仍然有效。

绘图构面结果

Note: If you do use the same variable for group and for facet, adding label.y= Inf, to each call will force the label to the top of each facet (hat tip @dc37, in another answer to this question).注意:如果您确实对组和构面使用相同的变量,则在每个调用中添加label.y= Inf,将强制 label 到每个构面的顶部(帽子提示 @dc37,在此问题的另一个答案中)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM