ggplot2 geom_smooth，方法的扩展模型= lm

Question

I would like to use geom_smooth to get a fitted line from a certain linear regression model. 我想使用geom_smooth从某个线性回归模型中获取拟合线。

It seems to me that the formula can only take x and y and not any additional parameter. 在我看来，公式只能采用x和y而不是任何其他参数。

To show more clearly what I want: 为了更清楚地显示我想要的东西：

library(dplyr)
library(ggplot2)
set.seed(35413)
df <- data.frame(pred = runif(100,10,100),
           factor = sample(c("A","B"), 100, replace = TRUE)) %>%
  mutate(
    outcome = 100 + 10*pred + 
    ifelse(factor=="B", 200, 0) + 
    ifelse(factor=="B", 4, 0)*pred +
    rnorm(100,0,60))

With 同

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  theme_bw()

I produce fitted lines that, due to the color=factor option, are basically the output of the linear model lm(outcome ~ pred*factor, df) 由于color=factor选项，我生成的拟合线基本上是线性模型lm(outcome ~ pred*factor, df)的输出lm(outcome ~ pred*factor, df)

In some cases, however, I prefer the lines to be the output of a different model fit, like lm(outcome ~ pred + factor, df) , for which I can use something like: 但是，在某些情况下，我更喜欢将线条作为不同模型拟合的输出，例如lm(outcome ~ pred + factor, df) ，我可以使用以下内容：

fit <- lm(outcome ~ pred+factor, df)
predval <- expand.grid(
  pred = seq(
    min(df$pred), max(df$pred), length.out = 1000),
  factor = unique(df$factor)) %>%
  mutate(outcome = predict(fit, newdata = .))

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point() +
  geom_line(data = predval) +
  theme_bw()

which results in : 这导致：

My question: is there a way to produce the latter graph exploiting the geom_smooth instead? 我的问题：有没有办法生成后一个利用geom_smooth图？ I know there is a formula = - option in geom_smooth but I can't make something like formula = y ~ x + factor or formula = y ~ x + color (as I defined color = factor ) work. 我知道在geom_smooth有一个formula = - 选项，但是我不能做出类似于formula = y ~ x + factor或formula = y ~ x + color （我定义的color = factor ）的工作。

Answer 1

This is a very interesting question. 这是一个非常有趣的问题。 Probably the main reason why geom_smooth is so "resistant" to allowing custom models of multiple variables is that it is limited to producing 2-D curves; geom_smooth对于允许多变量的自定义模型如此“抵抗”的主要原因可能是它仅限于生成二维曲线; consequently, its arguments are designed for handling two-dimensional data (ie formula = response variable ~ independent variable). 因此，其参数设计用于处理二维数据（即公式=响应变量〜自变量）。

The trick to getting what you requested is using the mapping argument within geom_smooth , instead of formula . 获得所需内容的技巧是使用geom_smooth的mapping参数，而不是formula 。 As you've probably seen from looking at the documentation , formula only allows you to specify the mathematical structure of the model (eg linear, quadratic, etc.). 正如您在查看文档时看到的那样， formula只允许您指定模型的数学结构（例如线性，二次等）。 Conversely, the mapping argument allows you to directly specify new y-values - such as the output of a custom linear model that you can call using predict() . 相反， mapping参数允许您直接指定新的y值 - 例如可以使用predict()调用的自定义线性模型的输出。

Note that, by default, inherit.aes is set to TRUE , so your plotted regressions will be coloured appropriately by your categorical variable. 请注意，默认情况下， inherit.aes设置为TRUE ，因此您绘制的回归将由分类变量适当地着色。 Here's the code: 这是代码：

# original plot
plot1 <- ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  ggtitle("outcome ~ pred") +
  theme_bw()

# declare new model here
plm <- lm(formula = outcome ~ pred + factor, data=df)

# plot with lm for outcome ~ pred + factor
plot2 <-ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm", mapping=aes(y=predict(plm,df))) +
  ggtitle("outcome ~ pred + factor") +
  theme_bw()

ggplot2 geom_smooth，方法的扩展模型= lm

问题描述

1 个解决方案

解决方案1
6 已采纳 2018-04-16 00:41:28

ggplot2 geom_smooth，方法的扩展模型= lm

问题描述

1 个解决方案

解决方案1 6 已采纳 2018-04-16 00:41:28

解决方案1
6 已采纳 2018-04-16 00:41:28