为什么do（lm…）和geom_smooth（method =“ lm”）之间有区别？

Question

I have an external calibration curve that slightly goes into saturation. 我有一条外部校准曲线，略有饱和。 So I fit a polynomial of second order, and a dataframe of measured samples, of which I'd like to know the concentration. 因此，我拟合了一个二阶多项式和一个测量样本的数据框，我想知道其浓度。

df_calibration=structure(list(dilution = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 
0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1), 
    area = c(1000, 2000, 3000, 4000, 5000, 6000, 7000, 7800, 
    8200, 8500, 1200, 2200, 3200, 4200, 5200, 6200, 7200, 8000, 
    8400, 8700), substance = c("A", "A", "A", "A", "A", "A", 
    "A", "A", "A", "A", "b", "b", "b", "b", "b", "b", "b", "b", 
    "b", "b")), row.names = c(NA, 20L), class = "data.frame")

df_samples=structure(list(area = c(1100, 1800, 2500, 3200, 3900, 1300, 2000, 
2700, 3400, 4100), substance = c("A", "A", "A", "A", "A", "b", 
"b", "b", "b", "b")), row.names = c(NA, 10L), class = "data.frame")

To calculate now the actual dilutions from measured samples, I take the parameters generated from this fit: 现在要从测量的样品中计算出实际的稀释度，我采用从该拟合中生成的参数：

df_fits=df_calibration %>% group_by(substance) %>% 
  do(fit = lm(area ~ poly(dilution,2), data = .))%>%
  tidy(fit) %>% 
  select(substance, term, estimate) %>% 
  spread(term, estimate)

df_fits=df_fits %>% rename(a=`poly(dilution, 2)2`,b=`poly(dilution, 2)1`,c=`(Intercept)`)

#join parameters with sample data
df_samples=left_join(df_samples,df_fits)

and this formula 和这个公式

#calculate with general solution for polynomial 2nd order
df_samples$dilution_calc=
  (df_samples$b*(-1)+sqrt(df_samples$b^2-(4*df_samples$a*(df_samples$c-df_samples$area))))/(2*df_samples$a)

However, when I plot this now, I notice something very odd. 但是，当我现在绘制此图时，我注意到一些非常奇怪的东西。 The calculated x-values (dilutions) do not end up on the curve from stat_smooth() . 计算得出的x值（稀释值）不会最终出现在stat_smooth()的曲线上。 The additional dotted line is put with the parameters from the equation in the graph (that match the numbers in the data frame) for substance "A". 附加的虚线与图形中方程式的参数（与数据框中的数字匹配）一起放入了物质“ A”。 So my calculations should be correct (or not?) Why is there a difference? 所以我的计算应该是正确的（或不正确？）为什么会有区别？ What am I doing wrong? 我究竟做错了什么？ How could I get parameters from the fit done by stat_smooth() ? 如何从stat_smooth()完成的拟合中获取参数？

my.formula=y ~ poly(x,2)
ggplot(df_calibration, aes(x = dilution, y = area)) +
  stat_smooth(method = "lm", se=FALSE, formula = my.formula) +

  stat_function(fun=function(x){5250+(7980*x)+(-905*x^2)},      
              inherit.aes = F,linetype="dotted")+

  stat_poly_eq(formula = my.formula, 
               aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), 
               parse = TRUE) +         
  geom_point(shape=17)+
  geom_point(data=df_samples,
           aes(x=dilution_calc,y=area),
           shape=1,color="red")+
  facet_wrap(~substance,scales = "free")

Any suggestion will be highly appreciated :-) 任何建议将不胜感激:-)

Answer 1

By default, poly computes orthogonal polynomials. 默认情况下， poly计算正交多项式。 You can turn orthogonalization off with the raw=TRUE argument. 您可以使用raw=TRUE参数关闭正交化。

Note that the formula makes two appearances: once with the original variable names in fitting the regressions and then in stat_smooth using the generic variable names x and y . 请注意，该公式有两个外观：一次是使用原始变量名来拟合回归，然后是stat_smooth使用通用变量名x和y 。 But otherwise it should be the same formula, with raw=TRUE . 但是否则它应该是相同的公式，其中raw=TRUE 。

library("tidyverse")

# Define/import your data here....

df_fits <- df_calibration %>%
  group_by(substance) %>%
  do(fit = lm(area ~ poly(dilution, 2, raw = TRUE), data = .)) %>%
  broom::tidy(fit) %>%
  select(substance, term, estimate) %>%
  spread(term, estimate) %>%
  # It is simpler to rename the coefficients here
  setNames(c("substance", "c", "b", "a"))

# join parameters with sample data
df_samples <- left_join(df_samples, df_fits)

# calculate with general solution for polynomial 2nd order
df_samples <- df_samples %>%
  mutate(dilution_calc = (b * (-1) + sqrt(b^2 - (4 * a * (c - area)))) / (2 * a))

my.formula <- y ~ poly(x, 2, raw = TRUE)

df_calibration %>%
  ggplot(aes(x = dilution, y = area)) +
  stat_smooth(method = "lm", se = FALSE, formula = my.formula) +
  geom_point(shape = 17) +
  geom_point(
    data = df_samples,
    aes(x = dilution_calc, y = area),
    shape = 1, color = "red"
  ) +
  facet_wrap(~substance, scales = "free")

^{Created on 2019-03-31 by the reprex package (v0.2.1)} ^{由reprex软件包（v0.2.1）创建于2019-03-31}

为什么do（lm…）和geom_smooth（method =“ lm”）之间有区别？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-03-31 09:53:20

为什么do（lm…）和geom_smooth（method =“ lm”）之间有区别？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-03-31 09:53:20

解决方案1
2 已采纳 2019-03-31 09:53:20