简体   繁体   中英

R long data levels count

Hello (sorry not sure what to write in the title to be more specific), I have a long formatted table and would like to extract the count of the AGE levels. This is the sample data. If I get the age count I should get something like 18-24:2, 25-34:1 (where 1 and 2 are the counts) Similarly if I do that for Gender I expect something like: M:1, F:2

dd <- data.frame("ID" = c(1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3),
                "GROUP" = c(1,1,1,1,1,3,3,3,3,3,3,3,3,3,2,2,2,2,2,2),
                "GENDER" = c("M","M","M","M","M","F","F","F","F","F","F","F","F","F","F","F","F","F","F","F"),
                "AGE" = c("25-34","25-34","25-34","25-34","25-34","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24","18-24"),
                "VALUE" = c(0.26,0.14,0.55,0.03,0.48,0.39,0.31,0.16,0.33,0.20,0.29,0.54,0.25,0.68,0.39,0.56,0.48,0.29,0.54,0.25),
                "ANCHOR" = c(4,5,6,3,6,3,2,4,1,5,2,1,6,5,3,4,3,2,1,6),
                "SPEED" = c(106,53,159,53,159,159,106,53,106,53,53,106,53,159,159,159,106,53,106,53))

I can do that for wide format data using table(as.factor(dd$AGE)) but it doesn't work in this case (it returns F:15, M:5). What's the best way to do it? I tried using filter and group_by but am not really getting it. thanks!

I am not sure if you want this sort of output

lapply(
  c("AGE", "GENDER"),
  function(v) aggregate(as.formula(paste0("ID ~", v)), dd, function(x) length(unique(x)))
)

giving

[[1]]
    AGE ID
1 18-24  2
2 25-34  1

[[2]]
  GENDER ID
1      F  2
2      M  1

You can get the data in long format and count number of unique ID s in different columns.

library(dplyr)
library(tidyr)

dd %>%
  select(ID, GENDER, AGE) %>%
  pivot_longer(cols = -ID) %>%
  group_by(name, value) %>%
  summarise(count = n_distinct(ID))

#  name   value count
#  <chr>  <chr> <int>
#1 AGE    18-24     2
#2 AGE    25-34     1
#3 GENDER F         2
#4 GENDER M         1

All previous answers look good. However I think I found a simpler one that seems to work.

dd <- df[!duplicated(df$ID),]
table(dd$AGE)

# 18-24 25-34 
# 2     1 

table(dd$GENDER)
# F M 
# 2 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM