简体   繁体   中英

count and listing all factor levels of all factors

I have a data frame in R like this:

 D          I        S       ...

 110       2012     1000
 111       2012     2000
 110       2012     1000
 111       2014     2000
 110       2013     1000
 111       2013     2000

I want to calculate how many factor levels are there for each factor and safe this in an DF like this:

 D     Count          I    Count           S    Count    ...

 110     3           2012      3          1000     3
 111     3           2013      2          2000     3
                     2014      1  

or this:

 D     Count    

 110     3     
 111     3     


  I    Count  

2012      3  
2013      2  
2014      1


 S    Count  

1000     3
2000     3

....

I tried to do it with sapply, levels, the library(dplyr) or aggregate, but it does not produce the desired output. How can I do that?

Here is a sulution using data.table

data <- data.frame(D = rep(c("110", "111"), 3),
                   I = c(rep("2012", 3), "2014", "2013", "2013"),
                   S = rep(c("1000", "2000"), 3))
str(data)
# you just want
table(data$D)
table(data$I)
table(data$S)
# one option using data.table
require(data.table)
dt <- as.data.table(data)
dt # see dt
dt[, table(D)] # or dt[, .N, by = D], for one variable
paste(names(dt), "Count", sep = "_") # names of new count columns
dt[, paste(names(dt), "Count", sep = "_") := lapply(.SD, table)]
dt # new dt
data2 <- as.data.frame(dt)[, sort(names(dt))]
data2 # final data frame

And a dplyr 's one for the second output.

counts <- data %>% 
  lapply(table) %>% 
  lapply(as.data.frame)
counts

I think the most efficient way to do it, in terms of length of code and storing final output in a tidy format is this:

library(tidyverse)

# example data
data <- data.frame(D = rep(c("110", "111"), 3),
                   I = c(rep("2012", 3), "2014", "2013", "2013"),
                   S = rep(c("1000", "2000"), 3))

data %>%
  gather(name,value) %>%  # reshape datset
  count(name, value)      # count combinations

# # A tibble: 7 x 3
#    name value     n
#   <chr> <chr> <int>
# 1     D   110     3
# 2     D   111     3
# 3     I  2012     3
# 4     I  2013     2
# 5     I  2014     1
# 6     S  1000     3
# 7     S  2000     3

1st column represent the name of you factor variable. 2nd column has the unique values of each variable. 3rd column is the counter.

I think the easy way is by using the "plyr" R-library.

library(plyr)

count(data$D)
count(data$I)
count(data$S)

It will give you

> count(data$D)
   x freq
1 110    3
2 111    3

> count(data$I)
    x freq
1 2012    3
2 2013    2
3 2014    1

> count(data$S)
    x freq
1 1000    3
2 2000    3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM