使用 ggplot 繪制邏輯回歸線：警告消息：“stat_smooth()”中的計算失敗：未使用的參數（數據 = 數據）

Question

我正在嘗試用 ggplot 和一個真實的數據集繪制一條二元回歸線以供練習。 問題：以公里為單位的距離是否是選擇汽車作為前往足球場的交通工具的預測因素。

變量 A2 被二分化（1 = Auto（汽車）和 0 = kein Auto（無汽車）），現在稱為 A2_auto

dataset %>%
mutate(A2_auto = car::recode(.$A2,
"1 = 1; 2:9 = 0", 
as.factor = FALSE)) -> dataset

dataset$A2_auto <- factor(dataset$A2_auto, labels = c("kein Auto",
                                                        "Auto"))

在我計算了決定系數（顯着但非常低的奇數比）之后，我想用 ggplot 繪制回歸曲線：

ggplot(data=dataset, aes(x=A21, y=A2_auto)) + 
  geom_point(alpha=.5) +
  stat_smooth(method="glm.fit", se=FALSE, method.args = list(family=binomial))

但我收到一條警告消息：

>`geom_smooth()` using formula 'y ~ x'
Warnmeldung:
Computation failed in `stat_smooth()`:
Unused Argument (data = data)

散點圖中沒有回歸線。 想不通為什么：

這是數據框的結構：

'data.frame':   689 obs. of  3 variables:
 $ A2     : dbl+lbl [1:689] 1, 1, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 3, 6, 6, 6, 6, 6, 6, 6...
   ..@ label        : chr "Mit welchem Verkehrsmittel legen Sie die größte Distanz zum Stadion zurück, wenn Sie ein Bundesliga-Heimspiel b"| __truncated__
   ..@ format.spss  : chr "F40.0"
   ..@ display_width: int 0
   ..@ labels       : Named num  1 2 3 4 5 6 7 8 9
   .. ..- attr(*, "names")= chr [1:9] "PKW" "Bahn (Fernverkehr)" "Bahn (Nahverkehr)" "Fernbus" ...
 $ A21    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "Distanz in km"
  ..- attr(*, "format.spss")= chr "F8.2"
  ..- attr(*, "display_width")= int 0
 $ A2_auto: Factor w/ 2 levels "kein Auto","Auto": 2 2 1 1 1 1 1 1 1 1 ...

謝謝您的幫助！

Edit1：這是 dput(head(dataset,50)) 的輸出：

structure(list(A2 = structure(c(1, 1, 6, 6, 6, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 6, 6, 6, 6, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 3, 6, 6, 6, 6, 6, 6), label = "Mit welchem Verkehrsmittel legen Sie die größte Distanz zum Stadion zurück, wenn Sie ein Bundesliga-Heimspiel besuchen? - Selected Choice", format.spss = "F40.0", display_width = 0L, labels = c(PKW = 1, 
`Bahn (Fernverkehr)` = 2, `Bahn (Nahverkehr)` = 3, Fernbus = 4, 
`Fan-/Reisebus` = 5, ÖPNV = 6, Fahrrad = 7, `Zu Fuß` = 8, Sonstige = 9
), class = c("haven_labelled", "vctrs_vctr", "double")), A21 = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 
6, 6, 6, 6, 6, 6, 6), A2_auto = structure(c(2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("kein Auto", 
"Auto"), class = "factor")), row.names = c(NA, 50L), class = "data.frame")

當我將模型從 glm.fit 更改為 glm 時，出現另一條警告消息：

ggplot(data=dataset, aes(x=A21, y=A2_auto)) + 
  geom_point(alpha=.5) +
  stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial))

輸出：

`geom_smooth()` using formula 'y ~ x'
Warnmeldungen:
1: glm.fit: algorithm did not converge 
2: Computation failed in `stat_smooth()`:
y values must be 0 <= y <= 1

我還將變量二分為 0 和 1（沒有因子），並且發生了相同的錯誤：

dataset %>%
  mutate(A2_auto = car::recode(.$A2,
  "1 = 1; 2:9 = 0", 
  as.factor = TRUE)) -> dataset

`geom_smooth()` using formula 'y ~ x'
Warnmeldungen:
1: glm.fit: algorithm did not converge 
2: Computation failed in `stat_smooth()`:
y values must be 0 <= y <= 1

我將嘗試按照評論中的建議使用 mtcars 重現我的示例。

Answer 1

我想我已經找到了解決辦法。 與數據集mtcars對比后，我仔細查看了數據集中的變量A2_auto，發現該變量畢竟不是數字。 所以我再次轉換它並將其二分。 此外，“glm”是評論中描述的正確方法。 再次感謝評論中的建議！ 它現在起作用了。

使用 ggplot 繪制邏輯回歸線：警告消息：“stat_smooth()”中的計算失敗：未使用的參數（數據 = 數據）

問題描述

1 個解決方案

解決方案1
0 2021-10-21 06:20:04

使用 ggplot 繪制邏輯回歸線：警告消息：“stat_smooth()”中的計算失敗：未使用的參數（數據 = 數據）

問題描述

1 個解決方案

解決方案1 0 2021-10-21 06:20:04

解決方案1
0 2021-10-21 06:20:04