計算並繪制 R 中具有泊松分布的多個類別的均值 + 置信區間

Question

我很難為我的數據集繪制均值 + 置信區間圖。 我的數據集由 2 列組成，以簡化：

df$category<- c("a", "d", "a", "q", "d", "d", "q", "d", "a", "q")
df$count<- c(3, 2, 0, 5, 0, 4, 8, 0, 2, 4)

因此它有 3 個類別（a、d 和 q），它們具有相應的計數數據。 我的真實數據集遵循泊松分布。

我想計算每個類別的平均值以及置信區間並將其繪制在條形圖中。

由於類別有不同的長度，我制作了每個類別的子集並嘗試了以下方法：

        SE<- function(x) sd(x)/sqrt(length(x))
        lim1<-function(x) mean(x)-1.96*SE(x)
        lim2<-function(x) mean(x)+1.96*SE(x)

        confidence1a<-apply(a$count, lim1) 
        confidence2a<-apply(a$count, lim2)

        confidence1d<-apply(d$count, lim1) 
        confidence2d<-apply(d$count, lim2)

計划之后將它們綁定到一個數據集中

但這導致了錯誤：應用錯誤（a$count，FUN = lim1）：dim（X）必須具有正長度

我怎樣才能解決這個問題而不必為每個子集寫出公式？ 我的真實數據集有 8 個以上的類別......另外，不必首先對每個類別進行子集化會更好。

如果有人能把它變成一些不錯的代碼，我將永遠感激不盡！

Answer 1

library(tidyverse)

df <- tibble(
  category = c("a", "d", "a", "q", "d", "d", "q", "d", "a", "q"),
  count =  c(3, 2, 0, 5, 0, 4, 8, 0, 2, 4)
) %>%  
  arrange_all()

df %>%
  group_by(category) %>%  
  mutate(mean = mean(count), 
         conf_lower = mean - 1.96*(sd(count) * length(count)), 
         conf_upper = mean + 1.96*(sd(count) * length(count)))

# A tibble: 10 x 5
# Groups:   category [3]
   category count  mean conf_lower conf_upper
   <chr>    <dbl> <dbl>      <dbl>      <dbl>
 1 a            0  1.67      -7.32       10.6
 2 a            2  1.67      -7.32       10.6
 3 a            3  1.67      -7.32       10.6
 4 d            0  1.5      -13.5        16.5
 5 d            0  1.5      -13.5        16.5
 6 d            2  1.5      -13.5        16.5
 7 d            4  1.5      -13.5        16.5
 8 q            4  5.67      -6.57       17.9
 9 q            5  5.67      -6.57       17.9
10 q            8  5.67      -6.57       17.9

Answer 2

使用 dplyr 進行的一些基本數據操作將允許在此處使用 ggplot 輕松繪圖。 您對泊松分布的置信區間的計算在這里並不完全正確 - 它不應導致負值，因此我已將其更改為適當的計算：

library(tidyverse)

df %>%
  group_by(category) %>%
  summarize(mean = mean(count),
            upper = mean(count) + 1.96 * sqrt(mean(count)/n()),
            lower = mean(count) - 1.96 * sqrt(mean(count)/n())) %>%
  ggplot(aes(category, mean)) +
  geom_col(fill = 'deepskyblue4') +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.5) +
  theme_minimal(base_size = 16)

計算並繪制 R 中具有泊松分布的多個類別的均值 + 置信區間

問題描述

2 個解決方案

解決方案1
0 2022-07-21 15:42:23

解決方案2
0 2022-07-21 15:45:00

計算並繪制 R 中具有泊松分布的多個類別的均值 + 置信區間

問題描述

2 個解決方案

解決方案1 0 2022-07-21 15:42:23

解決方案2 0 2022-07-21 15:45:00

解決方案1
0 2022-07-21 15:42:23

解決方案2
0 2022-07-21 15:45:00