簡體   English   中英

R race table中年齡中位數的計算方法

[英]A method to calculate the median age in a race table in R

我需要你的幫助,以找出我擁有的比賽表中的年齡中位數。 我的數據示例

                  
                   Asian Black Hispanic White
  15                   0     0        0     6
  17                   0     0        0     9
  19                   0     0        0     8
  20                   0     0        0    12
  20.8388888888889     2     0        0     0
  20.8583333333333     2     0        0     0
  21                   1     7        1    31
  21.4888888888889     0     0        2     0
  21.5277777777778     0     0        2     0

第一列繼續到 99 歲。我嘗試使用如下所示的 lapply function 這樣做,但我只得到了亞洲種族的中位數。

RaceAge= tapply(age,RACE, median)
RaceAge
   Asian    Black Hispanic    White 
      43       NA       NA       NA 

非常感謝。

data <- data.frame(age = c(15, 17, 19, 20, 20.8388888888889, 20.8583333333333, 
21, 21.4888888888889, 21.5277777777778), Asian = c(0L, 0L, 0L, 
0L, 2L, 2L, 1L, 0L, 0L), Black = c(0L, 0L, 0L, 0L, 0L, 0L, 7L, 
0L, 0L), Hispanic = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L), White = c(6L, 
9L, 8L, 12L, 0L, 0L, 31L, 0L, 0L))

apply(data[-1], 2, \(x) median(rep(data$age,x)))
#>    Asian    Black Hispanic    White 
#> 20.85833 21.00000 21.48889 20.00000

我們可能會使用

library(dplyr)
library(spatstat)
as.data.frame.matrix(tbl1) %>% 
  rownames_to_column('age') %>%
  summarise(across(-age, ~ weighted.median(age, w = .x, type = 1)))

-輸出

  Asian Black Hispanic White
1    26    17       19  21.8

或者另一種選擇是

df2 <- transform(as.data.frame(tbl1),
    AgeNiki = as.numeric(as.character(AgeNiki)))
by(df2, df2$Var2, FUN = function(x) weighted.median(x$AgeNiki,
     w = x$Freq, type = 1))
 df2$Var2: Asian
[1] 26
--------------------------------------------------------------------------------------------------------------------- 
df2$Var2: Black
[1] 17
--------------------------------------------------------------------------------------------------------------------- 
df2$Var2: Hispanic
[1] 19
--------------------------------------------------------------------------------------------------------------------- 
df2$Var2: White
[1] 21.8

數據

tbl1 <- structure(c(0L, 0L, 0L, 0L, 2L, 2L, 1L, 0L, 0L, 0L, 0L, 5L, 0L, 
0L, 0L, 5L, 7L, 13L, 11L, 4L, 0L, 0L, 0L, 0L, 0L, 0L, 7L, 0L, 
0L, 0L, 0L, 3L, 3L, 0L, 0L, 0L, 8L, 6L, 31L, 8L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 4L, 5L, 26L, 
28L, 6L, 9L, 8L, 12L, 0L, 0L, 31L, 0L, 0L, 0L, 0L, 36L), dim = c(18L, 
4L), dimnames = list(AgeNiki = c("15", "17", "19", "20", "20.8", 
"21", "21.4", "21.5", "21.8", "21.9", "22", "23", "23.1", "23.2", 
"24", "25", "26", "27"), c("Asian", "Black", "Hispanic", "White"
)), class = "table")

您可以將DescTools::Untable()tapply()一起使用:

with(DescTools::Untable(tbl1, colnames = c("age", "race")),
     tapply(as.numeric(levels(age))[age], race, median))

#   Asian    Black Hispanic    White
#    26.0     17.0     19.0     21.8

或者使用aggregate()

aggregate(age ~ race, DescTools::Untable(tbl1, colnames = c("age", "race")),
          \(x) median(as.numeric(levels(x))[x]))

#       race  age
# 1    Asian 26.0
# 2    Black 17.0
# 3 Hispanic 19.0
# 4    White 21.8

數據tbl1歸功於@akrun 的回答

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM