简体   繁体   English

预测 R 曲线上的值

[英]Predicting values on a curve in R

I am trying to predict the values of a curve in R using this data.我正在尝试使用此数据预测 R 中曲线的值。

   Distance Waggle_time
1       100        9.45
2       200        7.90
3       300        7.04
4       385        6.49
5       400        6.34
6       500        6.01
7       600        5.59
8       700        5.07
9       800        4.79
10      900        4.73
11     1000        4.62
12     1100        4.34
13     1250        4.33
14     1300        4.30
15     1400        4.10
16     1500        4.06
17     2000        3.31
18     2500        3.13
19     3000        2.77
20     3500        2.65
21     4000        2.52
22     4500        2.30
23     5000        2.22
24     6000        1.93
25     7000        1.71
26     8000        1.62
27     8500        1.46
28     9500        1.36

I have tried to use the predict function by creating a linear regression model and providing x values that should be used to predict the y values, but this gives me data that is completely wrong.我尝试通过创建线性回归模型并提供应该用于预测 y 值的 x 值来使用 predict 函数,但这给了我完全错误的数据。 I understand that I am probably completely on the wrong path so any help would be very much appreciated.我知道我可能完全走错了路,因此非常感谢任何帮助。

A few options are shown below.下面显示了一些选项。 It's not clear to me if you need help with the model fitting (statistical help) or the prediction of new values (technical help).我不清楚您是否需要模型拟合(统计帮助)或新值预测(技术帮助)方面的帮助。 If the former, you should look at Cross Validated .如果是前者,您应该查看Cross Validated

df <- structure(list(Distance = c(100L, 200L, 300L, 385L, 400L, 500L, 
600L, 700L, 800L, 900L, 1000L, 1100L, 1250L, 1300L, 1400L, 1500L, 
2000L, 2500L, 3000L, 3500L, 4000L, 4500L, 5000L, 6000L, 7000L, 
8000L, 8500L, 9500L), Waggle_time = c(9.45, 7.9, 7.04, 6.49, 
6.34, 6.01, 5.59, 5.07, 4.79, 4.73, 4.62, 4.34, 4.33, 4.3, 4.1, 
4.06, 3.31, 3.13, 2.77, 2.65, 2.52, 2.3, 2.22, 1.93, 1.71, 1.62, 
1.46, 1.36)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28"))



splfit <- smooth.spline(x = df$Distance, y = df$Waggle_time, spar = 0.5)
splpred <- data.frame(Distance = seq(min(df$Distance), max(df$Distance), length.out = 100))
splpred$Waggle_time <- predict(splfit, x = newdat$Distance)$y

# lmfit <- glm(Waggle_time ~ Distance, df, family = Gamma(link = "inverse"))
lmfit <- glm(Waggle_time ~ Distance, df, family = inverse.gaussian(link = "1/mu^2"))
glmpred <- splpred
glmpred$Waggle_time <- predict(lmfit, newdata = newdat, type = "response")


plot(Waggle_time ~ Distance, df)
lines(Waggle_time ~ Distance, splpred, col = 2)
lines(Waggle_time ~ Distance, glmpred, col = 3)
legend("topright", legend = c("spline", "inv. Gaussian"), col = 2:3, lty = 1)

在此处输入图片说明

Your data are described better by a log than a linear model.日志比线性模型更好地描述了您的数据。 Although you can do a formal model comparison to find this out, it's clear just from plotting the data and some candidate best fit lines:尽管您可以进行正式的模型比较以找出这一点,但仅通过绘制数据和一些候选最佳拟合线就很清楚:

library(tidyverse)

ggplot(df, aes(x = Distance, y = Waggle_time)) + 
  geom_point() + 
  geom_smooth(formula = y ~ log(x), method = "lm", se=F, colour = "blue") +
  geom_smooth(formula = y ~ x, method = "lm", se=F, colour = "red")

在此处输入图片说明

As such, you can log transform your data, fit a linear model, use this to predict the your distance values.因此,您可以对数据进行日志转换,拟合线性模型,使用它来预测您的距离值。 You can then take the exponent if you'd like the untransformed predictions back.如果您想要返回未转换的预测,则可以取指数。

df <- mutate(df, logdistance = log(Distance))
modelfit <- lm(logdistance ~ Waggle_time, df)
to_predict <- tibble(Waggle_time = c(12, 10, 8))

# Predict values with augment
augment(modelfit, newdata = to_predict) %>%
  mutate(truefitted = exp(.fitted))

Output, with predictions of Distance on your original scale in the column truefitted :输出,在truefitted列中对原始比例的Distance进行预测:

  Waggle_time .fitted .se.fit truefitted
        <dbl>   <dbl>   <dbl>      <dbl>
1          12    2.74  0.116        15.5
2          10    3.90  0.0885       49.6
3           8    5.07  0.0623      159. 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM