[英]How to plot predictions of binomial GLM that has both continuous and categorical variables
I have a binomial GLM in R, with several predictors that are both continuous and categorical. 我在R中有一个二项式GLM,其中有几个连续且分类的预测变量。
The response variable is "Presence", which is binary (0/1). 响应变量是“ Presence”,它是二进制(0/1)。 Length is a continuous variable, while all others are categorical.
长度是一个连续变量,而其他所有变量都是分类变量。
I am trying to plot predictions for each of the variables in the final model, particularly for "length", but I'm having difficulties. 我试图绘制最终模型中每个变量的预测,尤其是“长度”的预测,但是我遇到了困难。
My data are the following: 我的数据如下:
MyData<-structure(list(site = structure(c(3L, 1L, 3L, 2L, 1L, 4L, 3L,
4L, 1L, 2L, 4L, 5L, 5L, 1L, 4L, 3L, 2L, 4L, 1L, 4L, 5L, 1L, 5L,
4L, 3L, 1L, 3L, 5L, 5L, 4L, 4L, 3L, 1L, 5L, 1L, 3L, 1L, 4L, 4L,
3L, 4L, 4L, 2L, 3L, 1L, 4L, 2L, 1L, 1L, 4L, 4L, 4L, 1L, 3L, 3L,
2L, 1L, 4L, 2L, 5L, 5L, 3L, 3L, 2L, 5L, 2L, 4L, 5L, 2L, 4L, 4L,
2L, 5L, 2L, 3L, 5L, 4L, 4L, 5L, 1L, 1L, 3L, 2L, 4L, 3L, 1L, 4L,
3L, 1L, 4L, 3L, 3L, 4L, 5L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 1L, 1L,
5L, 5L, 1L, 5L, 2L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 5L, 3L, 3L, 3L,
5L, 1L, 5L, 2L, 3L, 4L, 5L, 5L, 1L, 4L, 2L, 5L, 3L, 2L, 5L, 4L,
3L, 3L, 3L, 1L, 1L, 4L, 1L, 2L, 4L, 5L, 1L, 1L, 2L, 2L, 5L, 3L,
4L, 4L, 1L, 5L, 2L, 4L, 3L, 1L, 1L, 3L, 2L, 1L, 3L, 4L, 3L, 1L,
5L, 3L, 3L, 3L, 4L, 1L, 1L, 3L, 4L, 3L, 1L, 1L, 1L, 1L, 5L, 1L,
3L, 4L, 3L, 2L, 1L, 1L, 2L, 5L, 2L, 1L, 5L, 3L, 1L, 4L, 1L, 3L,
3L, 3L, 3L, 5L, 1L, 4L, 1L, 1L, 3L, 3L, 4L, 1L, 3L, 3L, 4L, 2L,
5L, 5L, 5L, 1L, 4L, 4L, 3L, 1L, 2L, 3L, 1L, 3L, 1L, 1L, 4L, 3L,
1L, 1L, 5L, 3L, 1L), .Label = c("R1a", "R1b", "R2", "Za", "Zb"
), class = "factor"), species = structure(c(1L, 1L, 3L, 3L, 3L,
1L, 3L, 1L, 4L, 3L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 1L,
1L, 1L, 4L, 3L, 4L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 3L, 1L, 4L,
3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 1L, 1L, 1L,
1L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 3L,
3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 3L, 4L, 3L, 1L,
1L, 3L, 1L, 1L, 4L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 4L,
1L, 3L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 4L, 3L, 1L, 1L, 3L,
1L, 1L, 4L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 1L, 2L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 1L,
3L, 1L, 4L, 3L, 1L, 4L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 3L, 3L,
1L, 4L, 3L, 4L, 3L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 2L, 3L, 4L, 3L,
1L, 1L, 4L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 2L, 4L, 3L, 3L,
1L, 3L, 1L, 4L, 1L, 1L, 4L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 3L, 1L,
1L, 1L, 3L, 1L, 1L, 1L, 3L), .Label = c("Monogyna", "Other",
"Prunus", "Rosa"), class = "factor"), aspect = structure(c(4L,
4L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 3L, 4L,
3L, 4L, 3L, 1L, 4L, 4L, 3L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L,
4L, 4L, 2L, 4L, 3L, 3L, 1L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L,
3L, 3L, 3L, 4L, 1L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 3L, 4L, 1L,
4L, 3L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 3L, 3L, 4L, 4L, 4L,
2L, 4L, 3L, 3L, 4L, 3L, 4L, 4L, 3L, 4L, 3L, 3L, 4L, 4L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L,
3L, 2L, 3L, 1L, 2L, 5L, 2L, 4L, 4L, 4L, 3L, 3L, 1L, 2L, 4L, 3L,
4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 1L,
4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 3L, 4L, 4L, 3L, 4L, 2L, 3L, 4L, 4L, 2L, 3L, 2L,
4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L,
3L, 4L, 2L, 5L, 3L, 4L, 2L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 2L,
4L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 2L, 4L), .Label = c("East",
"Flat", "North", "South", "West"), class = "factor"), length = c(260L,
60L, 60L, 40L, 240L, 80L, 30L, 100L, 100L, 200L, 70L, 50L, 60L,
35L, 120L, 60L, 500L, 40L, 20L, 70L, 250L, 80L, 50L, 130L, 350L,
170L, 50L, 60L, 90L, 50L, 40L, 110L, 60L, 70L, 70L, 500L, 140L,
50L, 50L, 360L, 50L, 150L, 60L, 270L, 280L, 130L, 130L, 50L,
60L, 30L, 70L, 70L, 60L, 400L, 20L, 30L, 70L, 160L, 340L, 100L,
210L, 60L, 70L, 130L, 50L, 40L, 50L, 80L, 390L, 40L, 110L, 130L,
40L, 230L, 120L, 70L, 80L, 80L, 90L, 70L, 150L, 120L, 50L, 100L,
120L, 10L, 40L, 80L, 180L, 160L, 200L, 40L, 70L, 90L, 50L, 40L,
80L, 80L, 70L, 480L, 90L, 60L, 100L, 140L, 190L, 20L, 70L, 360L,
70L, 130L, 60L, 50L, 320L, 210L, 130L, 180L, 90L, 20L, 300L,
90L, 50L, 130L, 70L, 70L, 40L, 40L, 50L, 40L, 100L, 20L, 70L,
100L, 340L, 70L, 110L, 40L, 230L, 200L, 80L, 35L, 110L, 200L,
50L, 110L, 100L, 50L, 150L, 110L, 50L, 50L, 40L, 70L, 80L, 60L,
100L, 90L, 40L, 300L, 140L, 180L, 140L, 40L, 190L, 100L, 170L,
40L, 120L, 15L, 70L, 340L, 40L, 40L, 70L, 60L, 130L, 140L, 170L,
120L, 90L, 130L, 210L, 50L, 180L, 120L, 100L, 50L, 90L, 70L,
360L, 80L, 30L, 170L, 70L, 300L, 40L, 130L, 120L, 90L, 40L, 40L,
140L, 80L, 400L, 70L, 80L, 60L, 420L, 320L, 200L, 40L, 50L, 70L,
50L, 80L, 50L, 110L, 100L, 120L, 170L, 20L, 110L, 20L, 20L, 30L,
30L, 90L, 150L, 80L, 40L, 90L, 300L, 30L, 70L, 50L, 90L, 200L
), sun = structure(c(1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 3L,
3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L,
3L, 2L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, 1L,
1L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 3L,
1L, 3L, 3L, 2L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 1L,
2L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 2L, 1L, 3L, 1L, 1L, 3L, 3L,
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 1L,
3L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 3L,
3L, 1L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
3L, 3L, 3L, 3L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 2L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L,
1L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 2L,
3L, 3L), .Label = c("Half", "Shade", "Sun"), class = "factor"),
leaf = structure(c(2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 2L,
2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 1L,
2L, 2L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L,
2L, 4L, 1L, 2L, 4L, 1L, 2L, 4L, 2L, 4L, 2L, 2L, 2L, 1L, 4L,
4L, 1L, 4L, 1L, 2L, 4L, 3L, 2L, 2L, 2L, 2L, 4L, 2L, 4L, 2L,
2L, 2L, 2L, 2L, 4L, 1L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 4L, 2L, 2L, 1L, 4L, 2L, 2L, 2L, 1L, 4L, 2L, 2L, 1L, 1L,
1L, 2L, 4L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L,
4L, 2L, 2L, 4L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 4L, 4L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 4L, 2L, 2L, 2L, 4L, 2L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 1L, 2L,
4L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 4L, 4L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 4L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 4L, 1L, 1L, 2L,
1L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 4L, 2L,
2L), .Label = c("Large", "Medium", "Scarce", "Small"), class = "factor"),
Presence = c(0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L,
0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L
)), .Names = c("site", "species", "aspect", "length", "sun",
"leaf", "Presence"), row.names = c(NA, 236L), class = "data.frame")
(note that this is a reduced dataset, and I have already removed variables that were dropped during model selection) (请注意,这是一个简化的数据集,并且我已经删除了在模型选择期间删除的变量)
The optimal model is: 最佳模型是:
model <- glm(Presence ~ site + species + aspect + length + sun
+ leaf, data=MyData, family=binomial)
I tried the following, but it wants the other variables too, so I get an error: 我尝试了以下操作,但是它也需要其他变量,所以出现错误:
plot(MyData$length, MyData$Presence)
mydat1 <- data.frame(length = seq(from = 10, to = 500, by = 1)
pred1 <- predict(model, newdata = mydat1, type = "response")
lines(MyData$length, pred1)
So I tried specifying all variables, but then it only puts a horizontal line through the presence data points (and that means I need to specify all possible combinations of factor variables I suppose): 因此,我尝试指定所有变量,但随后只在存在数据点上划了一条水平线(这意味着我需要指定所有可能的因子变量组合):
plot(MyData$length, MyData$Presence)
mydat2 <- data.frame(length = seq(from = 10, to = 500, by = 1),
site = "R1a",
species = "Monogyna",
aspect = "Flat",
sun = "Sun",
leaf = "Scarce")
pred2 <- predict(model, newdata = mydat2, type = "response")
lines(MyData$length, pred2)
Finally, I tried the following code: 最后,我尝试了以下代码:
pred <- predict(model, type = "response")
par(mfrow=c(2,2))
for(i in names(MyData)){
plot(MyData[,i],pred,xlab=i, ylab="Probability")
}
I am confused by this last one, as I am not able to obtain the curve, plus the output gives me predicted values for variables that are not even in the optimal model. 我对最后一个感到困惑,因为我无法获得曲线,加上输出为我提供了即使在最佳模型中也无法预测的变量的预测值。
What I should expect under this model, is a sinusoidal curve, I suppose. 我认为,在这种模型下,我应该期望的是正弦曲线。 But that's not what I'm getting.
但这不是我得到的。
How can I produce a meaningful plot of predictions? 如何得出有意义的预测图?
Any help would be greatly appreciated. 任何帮助将不胜感激。
I would use the effects
package for some easier results for a single predictor. 我会使用
effects
包为单个预测变量提供一些更简单的结果。 Here is how: 方法如下:
library(effects)
fit <- as.data.frame(effect('length', model, xlevels = 100))
Plotting is easy (although note the overplotting): 绘图很容易(尽管要注意绘图):
plot(MyData$length, MyData$Presence)
lines(fit$length, fit$fit)
Or we can use ggplot2
: 或者我们可以使用
ggplot2
:
library(ggplot2)
ggplot() +
geom_count(aes(length, Presence), MyData) +
geom_line(aes(length, fit), fit, size = 1, col = 'red') +
geom_ribbon(aes(length, ymin = lower, ymax = upper), fit, alpha = 0.15) +
scale_size_area()
We can see that the effect of length is not very impressive. 我们可以看到长度的影响不是很令人印象深刻。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.