简体   繁体   English

创建和绘制置信区间

[英]Creating and plotting confidence intervals

I have fitted a gaussian GLM model to my data, i now wish to create 95% CIs and fit them to my data.我已经为我的数据安装了一个高斯 GLM model,我现在希望创建 95% CI 并将它们安装到我的数据中。 Im having a couple of issues with this when plotting as i cant get them to capture my data, they just seem to plot the same line as the model without captuing the data points.我在绘图时遇到了几个问题,因为我无法让它们捕获我的数据,它们似乎 plot 与 model 在同一行,但没有捕获数据点。 Also Im also unsure that I've created my CIs the correct way here for the mean.此外,我也不确定我是否以正确的方式在此处创建了我的 CI。 I entered my data and code below if anyone knows how to fix this如果有人知道如何解决这个问题,我在下面输入了我的数据和代码

data used使用的数据

aids
 cases quarter  date
1      2       1 83.00
2      6       2 83.25
3     10       3 83.50
4      8       4 83.75
5     12       1 84.00
6      9       2 84.25
7     28       3 84.50
8     28       4 84.75
9     36       1 85.00
10    32       2 85.25
11    46       3 85.50
12    47       4 85.75
13    50       1 86.00
14    61       2 86.25
15    99       3 86.50
16    95       4 86.75
17   150       1 87.00
18   143       2 87.25
19   197       3 87.50
20   159       4 87.75
21   204       1 88.00
22   168       2 88.25
23   196       3 88.50
24   194       4 88.75
25   210       1 89.00
26   180       2 89.25
27   277       3 89.50
28   181       4 89.75
29   327       1 90.00
30   276       2 90.25
31   365       3 90.50
32   300       4 90.75
33   356       1 91.00
34   304       2 91.25
35   307       3 91.50
36   386       4 91.75
37   331       1 92.00
38   368       2 92.25
39   416       3 92.50
40   374       4 92.75
41   412       1 93.00
42   358       2 93.25
43   416       3 93.50
44   414       4 93.75
45   496       1 94.00

my code used to create the model and intervals before plotting我的代码用于在绘图之前创建 model 和间隔

#creating the model
model3 = glm(cases ~ date,
         data = aids,
         family = poisson(link='log'))

#now to add approx. 95% confidence envelope around this line
#predict again but at the linear predictor level along with standard errors
my_preds <- predict(model3, newdata=data.frame(aids), se.fit=T, type="link") 

#calculate CI limit since linear predictor is approx. Gaussian
upper <- my_preds$fit+1.96*my_preds$se.fit #this might be logit not log
lower <- my_preds$fit-1.96*my_preds$se.fit

#transform the CI limit to get one at the level of the mean
upper <- exp(upper)/(1+exp(upper)) 
lower <- exp(lower)/(1+exp(lower))

#plotting data
plot(aids$date, aids$cases,
 xlab = 'Date', ylab = 'Cases', pch = 20)

#adding CI lines
plot(aids$date, exp(my_preds$fit), type = "link",
 xlab = 'Date', ylab = 'Cases') #add title
lines(aids$date,exp(my_preds$fit+1.96*my_preds$se.fit),lwd=2,lty=2)
lines(aids$date,exp(my_preds$fit-1.96*my_preds$se.fit),lwd=2,lty=2)

outcome i currently get with no data points, the model is correct here but the CI isnt as i have no data points, so the CIs are made incorrectly i think somewhere结果我目前没有数据点,model 在这里是正确的但是 CI 不是因为我没有数据点,所以我认为 CI 是错误的

在此处输入图像描述

Edit : Response to OP's providing full data set.编辑:对 OP 提供完整数据集的回应。

This started out as a question about plotting data and models on the same graph, but has morphed considerably.这最初是关于在同一图表上绘制数据和模型的问题,但已经发生了很大变化。 You seem you have an answer to the original question.您似乎对原始问题有答案。 Below is one way to address the rest.下面是解决 rest 的一种方法。

Looking at your (and my) plots it seems clear that poisson glm is just not a good model. To say it differently, the number of cases may vary with date, but is also influenced by other things not in your model (external regressors).看看你的(和我的)图,很明显泊松 glm 不是一个好的 model。换句话说,案例数量可能随日期而变化,但也受到其他因素的影响,而不是你的 model(外部回归变量) .

Plotting just your data suggests strongly that you have at least two and perhaps more regimes: time frames where the growth in cases follows different models.仅绘制您的数据强烈表明您至少有两种甚至更多的制度:案件增长遵循不同模型的时间范围。

ggplot(aids, aes(x=date)) + geom_point(aes(y=cases))

在此处输入图像描述

This suggests segmented regression .这表明分段回归 As with most things in R, there is a package for that (more than one actually).与 R 中的大多数内容一样,有一个 package(实际上不止一个)。 The code below uses the segmented package to build successive poisson glm using 1 breakpoint (two regimes).下面的代码使用segmented的 package 使用 1 个断点(两个区域)构建连续的泊松 glm。

library(data.table)
library(ggplot2)
library(segmented)
setDT(aids)        # convert aids to a data.table
aids[, pred:=
       predict(
         segmented(glm(cases~date, .SD, family = poisson), seg.Z = ~date, npsi=1), 
         type='response', se.fit=TRUE)$fit]
ggplot(aids, aes(x=date))+ geom_line(aes(y=pred))+ geom_point(aes(y=cases))

在此处输入图像描述

Note that we need to tell segmented the count of breakpoints, but not where they are - the algorithm figures that out for you.请注意,我们需要告诉segmented断点的数量,而不是断点的位置——算法会为您计算出来。 So here, we see a regime prior to 3Q87 which is well modeled using poission glm, and a regime after that which is not.所以在这里,我们看到 3Q87 之前的状态使用泊松 glm 很好地建模,而之后的状态则不是。 This is a fancy way of saying that "something happened" around 3Q87 which changed the course of the disease (at least in this data).这是一种奇特的说法,即 87 年 3 季度前后“发生了一些事情”,它改变了疾病的进程(至少在这个数据中是这样)。

The code below does the same thing but for between 1 and 4 breakpoints.下面的代码做同样的事情,但有 1 到 4 个断点。

get.pred <- \(p.n, p.DT) {
  fit     <- glm(cases~date, p.DT, family=poisson)
  seg.fit <- segmented(fit, seg.Z = ~date, npsi=p.n)
  predict(seg.fit, type='response', se.fit=TRUE)[c('fit', 'se.fit')]
}
gg.dt <- rbindlist(lapply(1:4, \(x) { copy(aids)[, c('pred', 'se'):=get.pred(x, .SD)][, npsi:=x] } ))
ggplot(gg.dt, aes(x=date))+
  geom_ribbon(aes(ymin=pred-1.96*se, ymax=pred+1.96*se), fill='grey80')+
  geom_line(aes(y=pred))+
  geom_point(aes(y=cases))+
  facet_wrap(~npsi)

在此处输入图像描述

Note that the location of the first breakpoint does not seem to change, and also that, notwithstanding the use of the poisson glm the growth appears linear in all but the first regime.请注意,第一个断点的位置似乎没有改变,而且,尽管使用了泊松 glm,但除第一个区域外,所有区域的增长都是线性的。

There are goodness-of-fit metrics described in the package documentation which can help you decide how many break points are most consistent with your data. package 文档中描述了拟合优度指标,可帮助您确定多少断点与您的数据最一致。

Finally, there is also the mcp package which is a bit more powerful but also a bit more complex to use.最后,还有mcp package,它功能更强大,但使用起来也更复杂。

Original Response : Here is one way that builds the model predictions and std.原始响应:这是构建 model 预测和标准的一种方法。 error in a data.table , then plots using ggplot . data.table中的错误,然后使用ggplot

library(data.table)
library(ggplot2)
setDT(aids)        # convert aids to a data.table
aids[, c('pred', 'se', 'resid.scale'):=predict(glm(cases~date, data=.SD, family=poisson), type='response', se.fit=TRUE)]
ggplot(aids, aes(x=date))+
  geom_ribbon(aes(ymin=pred-1.96*se, ymax=pred+1.96*se), fill='grey80')+
  geom_line(aes(y=pred))+
  geom_point(aes(y=cases))

在此处输入图像描述

Or, you could let ggplot do all the work for you.或者,您可以让ggplot为您完成所有工作。

ggplot(aids, aes(x=date, y=cases))+
  stat_smooth(method = glm, method.args=list(family=poisson))+
  geom_point()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM