簡體   English   中英

在 geom_smooth 和 stat_fit_tidy 中為 glm 公式添加偏移項

[英]Adding an offset term to a glm formula in geom_smooth and stat_fit_tidy

我有一個data.frame ,其中三個cluster中每兩個group的計數我正在擬合邏輯回歸(帶有logit link functionbinomial glm ),並使用ggplot2geom_bargeom_smooth繪制它- 使用ggpmiscstat_fit_tidy的值。

這是它的樣子:

數據:

library(dplyr)

observed.probability.df <- data.frame(cluster = c("c1","c1","c2","c2","c3","c3"), group = rep(c("A","B"),3), p = c(0.4,0.6,0.5,0.5,0.6,0.4))
observed.data.df <- do.call(rbind,lapply(c("c1","c2","c3"), function(l){
  do.call(rbind,lapply(c("A","B"), function(g)
    data.frame(cluster = l, group = g, value = c(rep(0,1000*dplyr::filter(observed.probability.df, cluster == l & group != g)$p),rep(1,1000*dplyr::filter(observed.probability.df, cluster == l & group == g)$p)))
  ))
}))

observed.probability.df$cluster <- factor(observed.probability.df$cluster, levels = c("c1","c2","c3"))
observed.data.df$cluster <- factor(observed.data.df$cluster, levels = c("c1","c2","c3"))
observed.probability.df$group <- factor(observed.probability.df$group, levels = c("A","B"))
observed.data.df$group <- factor(observed.data.df$group, levels = c("A","B"))

Plot:

library(ggplot2)
library(ggpmisc)

ggplot(observed.probability.df, aes(x = group, y = p, group = cluster, fill = group)) +
  geom_bar(stat = 'identity') +
  geom_smooth(data = observed.data.df, mapping = aes(x = group, y = value, group = cluster), color = "black", method = 'glm', method.args = list(family = binomial(link = 'logit'))) + 
  stat_fit_tidy(data = observed.data.df, mapping = aes(x = group, y = value, group = cluster, label = sprintf("P = %.3g", stat(x_p.value))), method = 'glm', method.args = list(formula = y ~ x, family = binomial(link = 'logit')), parse = T, label.x = "center", label.y = "top") +
  scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
  facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")

在此處輸入圖像描述

假設我有每個group的預期概率,我想將其添加為geom_smoothstat_fit_tidy glmoffset 我該怎么做呢?

此 Cross Validated post 之后,我將這些偏移量添加到observed.data.df

observed.data.df <- observed.data.df %>% dplyr::left_join(data.frame(group = c("A","B"), p = qlogis(c(0.45,0.55))))

然后嘗試將offset(p)表達式添加到geom_smoothstat_fit_tidy

ggplot(observed.probability.df, aes(x = group, y = p, group = cluster, fill = group)) +
  geom_bar(stat = 'identity') +
  geom_smooth(data = observed.data.df, mapping = aes(x = group, y = value, group = cluster), color = "black", method = 'glm', method.args = list(formula = y ~ x + offset(p), family = binomial(link = 'logit'))) + 
  stat_fit_tidy(data = observed.data.df, mapping = aes(x = group, y = value, group = cluster, label = sprintf("P = %.3g", stat(x_p.value))), method = 'glm', method.args = list(formula = y ~ x + offset(p), family = binomial(link = 'logit')), parse = T, label.x = "center", label.y = "top") +
  scale_x_discrete(name = NULL,labels = levels(observed.probability.df$group), breaks = sort(unique(observed.probability.df$group))) +
  facet_wrap(as.formula("~ cluster")) + theme_minimal() + theme(legend.title = element_blank()) + ylab("Fraction of cells")

但我收到這些警告:

Warning messages:
1: Computation failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
2: Computation failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
3: Computation failed in `stat_smooth()`:
invalid type (closure) for variable 'offset(p)' 
4: Computation failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 
5: Computation failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 
6: Computation failed in `stat_fit_tidy()`:
invalid type (closure) for variable 'offset(p)' 

表示無法識別此添加,並且 plot 僅與條一起出現: 在此處輸入圖像描述

知道如何將偏移項添加到geom_smoothstat_fit_tidy glm s? 或者甚至只是到geom_smooth glm(注釋掉stat_fit_tidy行)?

或者,是否可以將預測回歸線、SE 和通過在ggplot調用之外擬合glm獲得的 p 值添加到geom_bar ( fit <- glm(value ~ group + offset(p), data = observed.data.df, family = binomial(link = 'logit')) )?

問題是,在 model 中的 ggplot xy中,公式代表美學,而不是data中變量的名稱,即在 model 公式中的 ggplot 名稱中代表美學。 沒有p美學,所以當嘗試擬合時,找不到p 在這里不能傳遞數字向量,因為 ggplot 會將數據分成組並分別為每個組擬合 model,我們可以將單個數字向量作為常數值傳遞。 我認為人們需要定義一種新的偽美學及其相應的尺度,才能以這種方式進行擬合。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM