[英]R: Confidence intervals and prediction
我对置信区间和预测有疑问。
我有一个数据集(称为“数据”),包含2个不同变量S和N的158个观测值,尽管对于某些观测值N不可用。 我已经能够使用qplot绘制回归线和95%置信区间。 到现在为止还挺好。 现在,我有了第二个完全不同的数据集(称为“ data2”),其中包含127个N的观测值,并且想知道这对应于哪个S,以及这些S值的置信区间是多少。 我似乎无法预测这些值。 也许有人可以在这里帮助我?
这是我尝试的:
data.lm = lm(data$S~data$N)
newdata = data.frame(data2$N)
predict(data.lm, newdata, interval=c("confidence"))
这给我一个警告信息
Warning message:
'data2' had 127 rows but variables found have 158 rows
它给出了158行拟合值,上限值和下限值,但它们显然不属于我的data2 N值。
fit lwr upr
1 37.88919 37.66022 38.11816
2 38.38123 38.23795 38.52451
3 NA NA NA
4 37.59720 37.26820 37.92621
5 38.09655 37.92488 38.26823
6 37.77301 37.50590 38.04012
...
当我尝试诸如
data.lm = lm(data$S~data$N)
newdata = data.frame(N=5)
predict(data.lm, newdata, interval=c("confidence"))
它给了我警告和完全相同的输出。
我在这里可能很愚蠢,但是我发现了很多类似的问题,而且解决方案似乎总是我尝试过的方法。为什么为什么预测不给我一行适合度,upr和lwr的值,却似乎在做什么lm基于的数据?
提前非常感谢你
编辑:
我使用的数据:
structure(list(S = c(36.7735, 36.7735, 36.7735, 36.7735, 36.7735,
36.7735, 36.7735, 36.7735, 36.7735, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.766,
38.766, 38.766, 38.766, 38.766, 38.766, 38.766, 38.766, 38.766,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639), N = c(7.740086957, 9.716043478, NA, 6.567521739, 8.572826087,
7.273521739, 8.689478261, NA, 8.112565217, 9.370289089, 8.429912766,
9.178733143, 8.136725442, 9.127494831, 7.91849608, 8.775866462,
8.733992185, 8.47272603, 8.700879331, 9.57630994, 9.184129237,
9.501760687, 10.04023077, 9.887214462, 7.947499285, 8.681177515,
10.14076961, 8.990465816, 10.35920222, 8.793812067, 8.962143225,
NA, 10.89773618, 9.646558574, NA, 8.708896587, 8.482467842, 9.490473018,
9.724324492, 9.185016805, 9.367232547, 9.447726264, 10.49359078,
9.086775124, 8.951230645, 8.438922723, 7.612619197, 8.961837755,
NA, 8.473436422, 9.487274967, 8.839257463, 8.019280063, 8.829296324,
9.089621228, 12.66471665, NA, 7.93418751, 8.442549778, 12.43150655,
12.78812747, 9.499177641, 8.88329767, 12.06733547, 8.694287059,
8.733657869, 8.976294071, 11.61797642, NA, 9.223855496, 12.14555242,
9.177782834, 10.50860256, 8.830982089, 9.338875366, 11.10966871,
9.009297476, 9.114841643, 9.145197506, 7.508668256, 8.49838577,
11.70012856, 8.859038138, 9.984367135, 11.18147471, 8.504456058,
9.30440283, 8.491741245, 9.154016228, 7.969788358, 8.890420803,
9.391405036, 8.023003384, 12.06142165, 10.0134321, 7.829115845,
8.619827639, 7.965320738, 9.718533292, 9.642541995, 9.221551363,
9.638749044, 8.728496275, 7.882667305, 8.059467865, 10.88596514,
11.52200146, 8.465388516, 10.89040717, 8.652714649, 8.570009902,
9.575021118, 10.20114206, 8.030898045, 9.325947744, 9.383493864,
NA, 10.98718012, 13.58808295, 9.987675873, 11.59305101, 8.559274188,
10.87432015, 9.530456451, NA, 13.39915598, 14.50068995, 11.4377845,
9.874845508, 8.419345084, 9.833591752, 8.734194935, NA, 8.751516192,
10.74365351, 10.94957982, 11.43931675, 9.26461008, 10.88196331,
10.01986719, 8.521178027, 8.346310841, 9.116175981, 12.55888826,
11.55922318, 11.62731629, 9.974676715, 8.659476016, 9.714302784,
11.69627731, 9.404085345, 8.417580572, 10.26841052, 8.0505316,
14.56194307, 8.496000239, 8.36501204, 9.105109509)), .Names = c("S",
"N"), class = "data.frame", row.names = c(NA, -158L))
我想预测S值的新数据集:
structure(list(N = c(7.01, 8.02, 9.82, 7.83, 7.49, 8.41, 7.92,
9.7, 7.097, 8, 8.29, 8.34, 7.71, 7.87, 8.782, 8.17, 7.86, 7.665,
7.715, 10.6, 8.06, 7.53, 8.75, 8.29, 7.89, 8.94, 9.58, 9.26,
9.91, 11.6, 9.666, 10.96, 8.809, 9.142, 7.193, 8.616, 9.035,
9.123, 8.102, 8.137, 8.966, 8.333, 6.678, 8.856, 10.96, 8.401,
9.729, 8.755, 8.199, 9.004, 7.94, 8.84, 8.55, 8.26, 7.93, 9.03,
10.3, 10.1, 9.23, 8.41, 7.595, 7.351, 7.251, 8.606, 9.35, 7.786,
7.445, 9.441, 8.844, 8.411, 9.086, 8.609, 7.975, 7.203, 11.88,
6.786, 8.36, 11.1, 11.5, 11.57, 8.755, 12.64, 7.07, 10.58, 8.47,
8.13, 8.45, 9.21, 9.36, 10, 10.4, 12.5, 10.1, 10.2, 9.54, 7.78,
9.12, 8.41, 8.94, 9.22, 12.3, 9.75, 9.13, 10.4, 8.22, 8.4, 10.2,
9.95, 11.1, 10.6, 9.84, 10.1, 12.7, 8.2, 8.55, 11.6, 10.5, 8.09,
9.42, 11.2, 12.3, 7.776, 7.007, 7.306, 7.475, 7.469, 9.593)), .Names = "N",
class = "data.frame", row.names = c(NA,
-127L))
这是由于您指定模型的方式所致。 您在公式中指定了原始的data.frame,因此它将最终始终寻找该数据,而不是newdata中的正确变量。
mdl1 <- lm(mtcars$hp~mtcars$disp)
predict(mdl1,data.frame(disp=1:3))
1 2 3 4 5 6 7 8
115.74296 115.74296 92.99022 158.62312 203.25349 144.18388 203.25349 109.92351
9 10 11 12 13 14 15 16
107.34195 119.06836 119.06836 166.41155 166.41155 166.41155 252.25938 247.00875
17 18 19 20 21 22 23 24
238.25770 80.16993 78.85727 76.84453 98.28461 184.87627 178.75054 198.87796
25 26 27 28 29 30 31 32
220.75559 80.30119 98.37212 87.34579 199.31551 109.17967 177.43788 98.67840
Warning message:
'newdata' had 3 rows but variables found have 32 rows
您应该做的是使用公式仅指定变量名称,然后通过data
参数将原始数据源提供给lm
:
mdl2 <- lm(hp~disp,mtcars)
predict(mdl2,data.frame(disp=1:3))
1 2 3
46.17208 46.60964 47.04719
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.