简体   繁体   English

ggplot2 geom_smooth,方法的扩展模型= lm

[英]ggplot2 geom_smooth, extended model for method=lm

I would like to use geom_smooth to get a fitted line from a certain linear regression model. 我想使用geom_smooth从某个线性回归模型中获取拟合线。

It seems to me that the formula can only take x and y and not any additional parameter. 在我看来,公式只能采用xy而不是任何其他参数。

To show more clearly what I want: 为了更清楚地显示我想要的东西:

library(dplyr)
library(ggplot2)
set.seed(35413)
df <- data.frame(pred = runif(100,10,100),
           factor = sample(c("A","B"), 100, replace = TRUE)) %>%
  mutate(
    outcome = 100 + 10*pred + 
    ifelse(factor=="B", 200, 0) + 
    ifelse(factor=="B", 4, 0)*pred +
    rnorm(100,0,60))

With

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  theme_bw()

I produce fitted lines that, due to the color=factor option, are basically the output of the linear model lm(outcome ~ pred*factor, df) 由于color=factor选项,我生成的拟合线基本上是线性模型lm(outcome ~ pred*factor, df)的输出lm(outcome ~ pred*factor, df)

在此输入图像描述

In some cases, however, I prefer the lines to be the output of a different model fit, like lm(outcome ~ pred + factor, df) , for which I can use something like: 但是,在某些情况下,我更喜欢将线条作为不同模型拟合的输出,例如lm(outcome ~ pred + factor, df) ,我可以使用以下内容:

fit <- lm(outcome ~ pred+factor, df)
predval <- expand.grid(
  pred = seq(
    min(df$pred), max(df$pred), length.out = 1000),
  factor = unique(df$factor)) %>%
  mutate(outcome = predict(fit, newdata = .))

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point() +
  geom_line(data = predval) +
  theme_bw()

which results in : 这导致:

在此输入图像描述

My question: is there a way to produce the latter graph exploiting the geom_smooth instead? 我的问题:有没有办法生成后一个利用geom_smooth图? I know there is a formula = - option in geom_smooth but I can't make something like formula = y ~ x + factor or formula = y ~ x + color (as I defined color = factor ) work. 我知道在geom_smooth有一个formula = - 选项,但是我不能做出类似于formula = y ~ x + factorformula = y ~ x + color (我定义的color = factor )的工作。

This is a very interesting question. 这是一个非常有趣的问题。 Probably the main reason why geom_smooth is so "resistant" to allowing custom models of multiple variables is that it is limited to producing 2-D curves; geom_smooth对于允许多变量的自定义模型如此“抵抗”的主要原因可能是它仅限于生成二维曲线; consequently, its arguments are designed for handling two-dimensional data (ie formula = response variable ~ independent variable). 因此,其参数设计用于处理二维数据(即公式=响应变量〜自变量)。

The trick to getting what you requested is using the mapping argument within geom_smooth , instead of formula . 获得所需内容的技巧是使用geom_smoothmapping参数,而不是formula As you've probably seen from looking at the documentation , formula only allows you to specify the mathematical structure of the model (eg linear, quadratic, etc.). 正如您在查看文档时看到的那样, formula只允许您指定模型的数学结构(例如线性,二次等)。 Conversely, the mapping argument allows you to directly specify new y-values - such as the output of a custom linear model that you can call using predict() . 相反, mapping参数允许您直接指定新的y值 - 例如可以使用predict()调用的自定义线性模型的输出。

Note that, by default, inherit.aes is set to TRUE , so your plotted regressions will be coloured appropriately by your categorical variable. 请注意,默认情况下, inherit.aes设置为TRUE ,因此您绘制的回归将由分类变量适当地着色。 Here's the code: 这是代码:

# original plot
plot1 <- ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  ggtitle("outcome ~ pred") +
  theme_bw()

# declare new model here
plm <- lm(formula = outcome ~ pred + factor, data=df)

# plot with lm for outcome ~ pred + factor
plot2 <-ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm", mapping=aes(y=predict(plm,df))) +
  ggtitle("outcome ~ pred + factor") +
  theme_bw()

在此输入图像描述 在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM