I have a data frame in R like this:
D I S ...
110 2012 1000
111 2012 2000
110 2012 1000
111 2014 2000
110 2013 1000
111 2013 2000
I want to calculate how many factor levels are there for each factor and safe this in an DF like this:
D Count I Count S Count ...
110 3 2012 3 1000 3
111 3 2013 2 2000 3
2014 1
or this:
D Count
110 3
111 3
I Count
2012 3
2013 2
2014 1
S Count
1000 3
2000 3
....
I tried to do it with sapply, levels, the library(dplyr) or aggregate, but it does not produce the desired output. How can I do that?
Here is a sulution using data.table
data <- data.frame(D = rep(c("110", "111"), 3),
I = c(rep("2012", 3), "2014", "2013", "2013"),
S = rep(c("1000", "2000"), 3))
str(data)
# you just want
table(data$D)
table(data$I)
table(data$S)
# one option using data.table
require(data.table)
dt <- as.data.table(data)
dt # see dt
dt[, table(D)] # or dt[, .N, by = D], for one variable
paste(names(dt), "Count", sep = "_") # names of new count columns
dt[, paste(names(dt), "Count", sep = "_") := lapply(.SD, table)]
dt # new dt
data2 <- as.data.frame(dt)[, sort(names(dt))]
data2 # final data frame
And a dplyr
's one for the second output.
counts <- data %>%
lapply(table) %>%
lapply(as.data.frame)
counts
I think the most efficient way to do it, in terms of length of code and storing final output in a tidy format is this:
library(tidyverse)
# example data
data <- data.frame(D = rep(c("110", "111"), 3),
I = c(rep("2012", 3), "2014", "2013", "2013"),
S = rep(c("1000", "2000"), 3))
data %>%
gather(name,value) %>% # reshape datset
count(name, value) # count combinations
# # A tibble: 7 x 3
# name value n
# <chr> <chr> <int>
# 1 D 110 3
# 2 D 111 3
# 3 I 2012 3
# 4 I 2013 2
# 5 I 2014 1
# 6 S 1000 3
# 7 S 2000 3
1st column represent the name of you factor variable. 2nd column has the unique values of each variable. 3rd column is the counter.
I think the easy way is by using the "plyr" R-library.
library(plyr)
count(data$D)
count(data$I)
count(data$S)
It will give you
> count(data$D)
x freq
1 110 3
2 111 3
> count(data$I)
x freq
1 2012 3
2 2013 2
3 2014 1
> count(data$S)
x freq
1 1000 3
2 2000 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.