簡體   English   中英

匯總數據並計算 R 中子隊列的置信區間

[英]aggregate data and calculate confidence intervals for sub cohort in R

我有一個數據集

testData <- structure(list(group = c("Group1", "Group1", "Group1", "Group1", 
                                 "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", 
                                 "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", 
                                 "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", 
                                 "Group1", "Group1", "Group1", "Group1", "Group1", "Group2", "Group2", 
                                 "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", 
                                 "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", 
                                 "Group2", "Group2", "Group2", "Group2", "Group2", "Group2"), 
                       year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
                                2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
                                2016, 2016, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
                                2017, 2017, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
                                2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 
                                2017, 2017, 2017, 2017), category = c("cat1", "cat1", "cat1", 
                                                                      "cat1", "cat1", "cat2", "cat2", "cat2", "cat2", "cat2", "cat1", 
                                                                      "cat1", "cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2", 
                                                                      "cat2", "cat1", "cat1", "cat1", "cat1", "cat1", "cat2", "cat2", 
                                                                      "cat2", "cat2", "cat2", "cat2", "cat2", "cat2", "cat2", "cat2", 
                                                                      "cat2", "cat2", "cat3", "cat3", "cat3", "cat3", "cat3", "cat3", 
                                                                      "cat3", "cat3", "cat3", "cat3", "cat3", "cat1", "cat1", "cat1", 
                                                                      "cat1"), value = c(30.1660205462388, 96.1649663179749, 183.691571800985, 
                                                                                         1.65328912643215, 9.30044741412784, 182.449748512614, 8.47095574122154, 
                                                                                         23.3081277048748, 53.1188233968077, 34.250829201039, 50.5445297997031, 
                                                                                         120.307165280983, 140.223343284331, 122.319359028798, 43.0193263100948, 
                                                                                         134.417238652291, 106.437343685401, 84.0446901587849, 69.7099679759042, 
                                                                                         132.101156129094, 27.8329259333861, 58.4953521410472, 100.379478360197, 
                                                                                         77.2357869871934, 200.464054913284, 47.6252352008202, 109.598360734847, 
                                                                                         18.1730751285375, 67.5769989539879, 26.7504753716622, 16.8630228114074, 
                                                                                         75.2053705357279, 39.7641860921024, 126.658782796637, 64.8507816634371, 
                                                                                         96.3471066298501, 61.4392604693245, 27.6801895514785, 181.599217867455, 
                                                                                         11.1036117561468, 68.1516849014302, 115.899355317842, 167.032368398535, 
                                                                                         116.634854779718, 144.080455202308, 186.627050299051, 72.3807151133032, 
                                                                                         37.6345953992576, 2.09517321452513, 58.3682650864716, 54.3590148062561, 
                                                                                         53.9884625670805)), row.names = c(NA, -52L), class = c("data.table", 
                                                                                                                                                "data.frame"))

我想聚合不同級別的數據,並計算相應聚合級別的value的置信區間。 例如,我定義了兩個應該用於聚合的因子版本:

cohort1 = c("group" ,"category", "year")
cohort2 = c("group" ,"category")

我寫了一個 function 來計算置信區間:

calculateCI <- function(value){
  
  avg <- mean(value)
  s <- sqrt(var(value))
  n <- length(value)
  
  error <- qnorm(0.975)*s/sqrt(n)
  
  lower <- avg - error
  upper <- avg + error 
  
  return(list(lowerCI = lower, 
              upperCI = upper))
  
}

如何聚合數據並計算置信區間?

我試過取消 dplyr:

testData %>%
  group_by(cohort) %>%
  group_map(~ calculateCI(.x$value))

但它不適用於向量cohort 如何將向量作為group_by的參數傳遞

此外,我希望以 data.table 的形式獲得結果,其中有一列用於上下置信區間:

group category year sumValue lowerCi upperCi
 1: Group1     cat1 2015 320.9763     xxx     yyy
 2: Group1     cat2 2015 301.5985     xxx     yyy
 3: Group1     cat1 2016 476.4137     xxx     yyy
 4: Group1     cat2 2016 526.7104     xxx     yyy
 5: Group1     cat1 2017 464.4076     xxx     yyy
 6: Group1     cat2 2017 269.7241     xxx     yyy
 7: Group2     cat2 2016 481.1285     xxx     yyy
 8: Group2     cat3 2016 832.1817     xxx     yyy
 9: Group2     cat3 2017 296.6424     xxx     yyy
10: Group2     cat1 2017 168.8109     xxx     yyy

您可以按組計算平均值和 SD:

tapply(testData$value, INDEX = list(testData$group), FUN = mean, na.rm = TRUE) 
tapply(testData$value, INDEX = list(testData$group), FUN = sd, na.rm = TRUE) 

或多於一個因素:

tapply(testData$value, INDEX = list(testData$category, testData$group), FUN = mean, na.rm = TRUE) 

並計算 CI

library(dplyr)

testData %>%
group_by(group, category, year) %>%
summarise(across(.cols = value, .fns = c("n" = ~n(), "Mean" = mean, "StDev" = sd), .names = "{.fn}"), .groups = "drop") %>%
mutate("StE" = StDev / sqrt(n)) %>%
mutate("LowerCI95" = Mean - (1.96 * StE),
     "UpperCI95" = Mean + (1.96 * StE))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM