简体   繁体   中英

How to reshape and summarise categorical data from long to wide?

My data base is like this one:

db <- data.frame(var1 = c("A", "B", "C", "D", "E"), var2 = c("X", "X", "Y", "Y", "Y"),
           var3 = c("G", "H", "G", "G", "K"))
db

  var1 var2 var3
    A    X    G
    B    X    H
    C    Y    G
    D    Y    G
    E    Y    K

I'd like to reshape based in var2 and count the occurrences of var3 to get this result:

  var2 var3.G var3.H var3.K
    X    1      1      0
    Y    2      0      1

I have tried the cast and the reshape functions with no success.

The xtabs function is reasonably simple to use. The only cognitive jump is to realize that there is no LHS unless you want to do summation of a third variable:

> xtabs( ~var2+var3, data=db)
    var3
var2 G H K
   X 1 1 0
   Y 2 0 1

You don't want to do as.data.frame on this since it will convert to long form but you can use as.data.frame.matrix on it, since an R-'table' inherits from the 'matrix' class.

tbl <- data.frame( var2 = db[,2], var3 = paste("var3", db[,3], sep = "."))
table(tbl)
    var3
var2 var3.G var3.H var3.K
   X      1      1      0
   Y      2      0      1

One more option. Using the super useful data.table package:

library(data.table)

db <- data.table(var1 = c("A", "B", "C", "D", "E"), var2 = c("X", "X", "Y", "Y", "Y"),
           var3 = c("G", "H", "G", "G", "K"))

dcast.data.table(db, var2 ~ var3, fun = length, value.var= 'var3')
   var2 G H K
1:    X 1 1 0
2:    Y 2 0 1

Here is another way go about it:

You can use a combination of t() and table().

db <- data.frame(var1 = c("A", "B", "C", "D", "E"), 
                 var2 = c("X", "X", "Y", "Y", "Y"),
                 var3 = c("G", "H", "G", "G", "K"))
db

t(table(db$var3,db$var2))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM