簡體   English   中英

如何繪制具有連續變量和分類變量的二項式GLM的預測

[英]How to plot predictions of binomial GLM that has both continuous and categorical variables

我在R中有一個二項式GLM,其中有幾個連續且分類的預測變量。

響應變量是“ Presence”,它是二進制(0/1)。 長度是一個連續變量,而其他所有變量都是分類變量。

我試圖繪制最終模型中每個變量的預測,尤其是“長度”的預測,但是我遇到了困難。

我的數據如下:

MyData<-structure(list(site = structure(c(3L, 1L, 3L, 2L, 1L, 4L, 3L, 
4L, 1L, 2L, 4L, 5L, 5L, 1L, 4L, 3L, 2L, 4L, 1L, 4L, 5L, 1L, 5L, 
4L, 3L, 1L, 3L, 5L, 5L, 4L, 4L, 3L, 1L, 5L, 1L, 3L, 1L, 4L, 4L, 
3L, 4L, 4L, 2L, 3L, 1L, 4L, 2L, 1L, 1L, 4L, 4L, 4L, 1L, 3L, 3L, 
2L, 1L, 4L, 2L, 5L, 5L, 3L, 3L, 2L, 5L, 2L, 4L, 5L, 2L, 4L, 4L, 
2L, 5L, 2L, 3L, 5L, 4L, 4L, 5L, 1L, 1L, 3L, 2L, 4L, 3L, 1L, 4L, 
3L, 1L, 4L, 3L, 3L, 4L, 5L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 1L, 1L, 
5L, 5L, 1L, 5L, 2L, 3L, 4L, 4L, 3L, 2L, 3L, 3L, 5L, 3L, 3L, 3L, 
5L, 1L, 5L, 2L, 3L, 4L, 5L, 5L, 1L, 4L, 2L, 5L, 3L, 2L, 5L, 4L, 
3L, 3L, 3L, 1L, 1L, 4L, 1L, 2L, 4L, 5L, 1L, 1L, 2L, 2L, 5L, 3L, 
4L, 4L, 1L, 5L, 2L, 4L, 3L, 1L, 1L, 3L, 2L, 1L, 3L, 4L, 3L, 1L, 
5L, 3L, 3L, 3L, 4L, 1L, 1L, 3L, 4L, 3L, 1L, 1L, 1L, 1L, 5L, 1L, 
3L, 4L, 3L, 2L, 1L, 1L, 2L, 5L, 2L, 1L, 5L, 3L, 1L, 4L, 1L, 3L, 
3L, 3L, 3L, 5L, 1L, 4L, 1L, 1L, 3L, 3L, 4L, 1L, 3L, 3L, 4L, 2L, 
5L, 5L, 5L, 1L, 4L, 4L, 3L, 1L, 2L, 3L, 1L, 3L, 1L, 1L, 4L, 3L, 
1L, 1L, 5L, 3L, 1L), .Label = c("R1a", "R1b", "R2", "Za", "Zb"
), class = "factor"), species = structure(c(1L, 1L, 3L, 3L, 3L, 
1L, 3L, 1L, 4L, 3L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 
1L, 1L, 4L, 3L, 4L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 3L, 1L, 4L, 
3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 
1L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 
3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 3L, 4L, 3L, 1L, 
1L, 3L, 1L, 1L, 4L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 4L, 
1L, 3L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 4L, 3L, 1L, 1L, 3L, 
1L, 1L, 4L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 
3L, 1L, 4L, 3L, 1L, 4L, 1L, 1L, 3L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 
1L, 4L, 3L, 4L, 3L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 2L, 3L, 4L, 3L, 
1L, 1L, 4L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 2L, 4L, 3L, 3L, 
1L, 3L, 1L, 4L, 1L, 1L, 4L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 3L, 1L, 
1L, 1L, 3L, 1L, 1L, 1L, 3L), .Label = c("Monogyna", "Other", 
"Prunus", "Rosa"), class = "factor"), aspect = structure(c(4L, 
4L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 3L, 4L, 
3L, 4L, 3L, 1L, 4L, 4L, 3L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L, 
4L, 4L, 2L, 4L, 3L, 3L, 1L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 
3L, 3L, 3L, 4L, 1L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 3L, 4L, 1L, 
4L, 3L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 3L, 3L, 4L, 4L, 4L, 
2L, 4L, 3L, 3L, 4L, 3L, 4L, 4L, 3L, 4L, 3L, 3L, 4L, 4L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L, 
3L, 2L, 3L, 1L, 2L, 5L, 2L, 4L, 4L, 4L, 3L, 3L, 1L, 2L, 4L, 3L, 
4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 1L, 
4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 
4L, 4L, 3L, 3L, 3L, 4L, 4L, 3L, 4L, 2L, 3L, 4L, 4L, 2L, 3L, 2L, 
4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 2L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 
3L, 4L, 2L, 5L, 3L, 4L, 2L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 2L, 
4L, 3L, 4L, 4L, 3L, 4L, 4L, 4L, 3L, 2L, 4L), .Label = c("East", 
"Flat", "North", "South", "West"), class = "factor"), length = c(260L, 
60L, 60L, 40L, 240L, 80L, 30L, 100L, 100L, 200L, 70L, 50L, 60L, 
35L, 120L, 60L, 500L, 40L, 20L, 70L, 250L, 80L, 50L, 130L, 350L, 
170L, 50L, 60L, 90L, 50L, 40L, 110L, 60L, 70L, 70L, 500L, 140L, 
50L, 50L, 360L, 50L, 150L, 60L, 270L, 280L, 130L, 130L, 50L, 
60L, 30L, 70L, 70L, 60L, 400L, 20L, 30L, 70L, 160L, 340L, 100L, 
210L, 60L, 70L, 130L, 50L, 40L, 50L, 80L, 390L, 40L, 110L, 130L, 
40L, 230L, 120L, 70L, 80L, 80L, 90L, 70L, 150L, 120L, 50L, 100L, 
120L, 10L, 40L, 80L, 180L, 160L, 200L, 40L, 70L, 90L, 50L, 40L, 
80L, 80L, 70L, 480L, 90L, 60L, 100L, 140L, 190L, 20L, 70L, 360L, 
70L, 130L, 60L, 50L, 320L, 210L, 130L, 180L, 90L, 20L, 300L, 
90L, 50L, 130L, 70L, 70L, 40L, 40L, 50L, 40L, 100L, 20L, 70L, 
100L, 340L, 70L, 110L, 40L, 230L, 200L, 80L, 35L, 110L, 200L, 
50L, 110L, 100L, 50L, 150L, 110L, 50L, 50L, 40L, 70L, 80L, 60L, 
100L, 90L, 40L, 300L, 140L, 180L, 140L, 40L, 190L, 100L, 170L, 
40L, 120L, 15L, 70L, 340L, 40L, 40L, 70L, 60L, 130L, 140L, 170L, 
120L, 90L, 130L, 210L, 50L, 180L, 120L, 100L, 50L, 90L, 70L, 
360L, 80L, 30L, 170L, 70L, 300L, 40L, 130L, 120L, 90L, 40L, 40L, 
140L, 80L, 400L, 70L, 80L, 60L, 420L, 320L, 200L, 40L, 50L, 70L, 
50L, 80L, 50L, 110L, 100L, 120L, 170L, 20L, 110L, 20L, 20L, 30L, 
30L, 90L, 150L, 80L, 40L, 90L, 300L, 30L, 70L, 50L, 90L, 200L
), sun = structure(c(1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 
3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 
3L, 2L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 2L, 1L, 
1L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 3L, 
1L, 3L, 3L, 2L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 1L, 
2L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 2L, 1L, 3L, 1L, 1L, 3L, 3L, 
1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 1L, 
3L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 3L, 
3L, 1L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 
1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 
3L, 3L, 3L, 3L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 
3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 
1L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 2L, 
3L, 3L), .Label = c("Half", "Shade", "Sun"), class = "factor"), 
    leaf = structure(c(2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 
    2L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 1L, 
    2L, 2L, 4L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 
    2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 
    2L, 4L, 1L, 2L, 4L, 1L, 2L, 4L, 2L, 4L, 2L, 2L, 2L, 1L, 4L, 
    4L, 1L, 4L, 1L, 2L, 4L, 3L, 2L, 2L, 2L, 2L, 4L, 2L, 4L, 2L, 
    2L, 2L, 2L, 2L, 4L, 1L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
    1L, 4L, 2L, 2L, 1L, 4L, 2L, 2L, 2L, 1L, 4L, 2L, 2L, 1L, 1L, 
    1L, 2L, 4L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 
    4L, 2L, 2L, 4L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 
    2L, 2L, 1L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 4L, 4L, 1L, 1L, 
    2L, 2L, 2L, 1L, 1L, 1L, 1L, 4L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 
    1L, 1L, 2L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 4L, 1L, 2L, 
    4L, 2L, 2L, 1L, 2L, 2L, 4L, 2L, 4L, 4L, 2L, 2L, 1L, 2L, 2L, 
    2L, 2L, 4L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 4L, 1L, 1L, 2L, 
    1L, 2L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 4L, 2L, 
    2L), .Label = c("Large", "Medium", "Scarce", "Small"), class = "factor"), 
    Presence = c(0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 
    0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
    1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 
    0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 
    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
    1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L
    )), .Names = c("site", "species", "aspect", "length", "sun", 
"leaf", "Presence"), row.names = c(NA, 236L), class = "data.frame")

(請注意,這是一個簡化的數據集,並且我已經刪除了在模型選擇期間刪除的變量)

最佳模型是:

model <- glm(Presence ~ site + species + aspect + length + sun 
                + leaf, data=MyData, family=binomial)

我嘗試了以下操作,但是它也需要其他變量,所以出現錯誤:

plot(MyData$length, MyData$Presence)
mydat1 <- data.frame(length = seq(from = 10, to = 500, by = 1)
pred1 <- predict(model, newdata = mydat1, type = "response")
lines(MyData$length, pred1)

因此,我嘗試指定所有變量,但隨后只在存在數據點上划了一條水平線(這意味着我需要指定所有可能的因子變量組合):

plot(MyData$length, MyData$Presence)
mydat2 <- data.frame(length = seq(from = 10, to = 500, by = 1), 
                     site = "R1a", 
                     species = "Monogyna",
                     aspect = "Flat", 
                     sun = "Sun", 
                     leaf = "Scarce")
pred2 <- predict(model, newdata = mydat2, type = "response")
lines(MyData$length, pred2)

最后,我嘗試了以下代碼:

pred <- predict(model, type = "response")
par(mfrow=c(2,2))
for(i in names(MyData)){
   plot(MyData[,i],pred,xlab=i, ylab="Probability")
}

我對最后一個感到困惑,因為我無法獲得曲線,加上輸出為我提供了即使在最佳模型中也無法預測的變量的預測值。

我認為,在這種模型下,我應該期望的是正弦曲線。 但這不是我得到的。

如何得出有意義的預測圖?

任何幫助將不勝感激。

我會使用effects包為單個預測變量提供一些更簡單的結果。 方法如下:

library(effects)
fit <- as.data.frame(effect('length', model, xlevels = 100))

繪圖很容易(盡管要注意繪圖):

plot(MyData$length, MyData$Presence)
lines(fit$length, fit$fit)

在此處輸入圖片說明

或者我們可以使用ggplot2

library(ggplot2)
ggplot() +
  geom_count(aes(length, Presence), MyData) +
  geom_line(aes(length, fit), fit, size = 1, col = 'red') +
  geom_ribbon(aes(length, ymin = lower, ymax = upper), fit, alpha = 0.15) +
  scale_size_area()

在此處輸入圖片說明

我們可以看到長度的影響不是很令人印象深刻。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM