简体   繁体   中英

Reshape wide data to long with multiple rows using data.table

I have data like below

#    am     qsec        vs am     gear     carb
# 1:  1 17.36000 0.5384615  1 4.384615 2.923077
# 2:  1 17.02000 1.0000000  1 4.000000 2.000000
# 3:  0 18.18316 0.3684211  0 3.210526 2.736842
# 4:  0 17.82000 0.0000000  0 3.000000 3.000000

and I would like to produce

 #    variable          0          1
 # 1:     qsec 18.1831579 17.3600000
 # 2:     qsec 17.8200000 17.0200000
 # 3:       vs  0.3684211  0.5384615
 # 4:       vs  0.0000000  1.0000000
 # 5:       am  0.0000000  1.0000000
 # <snip>

where the am groups in the input data are used as columns in the output data.

I can do this through multiple steps (shown below in "data out") but I would like to be able to do this in a more data.table y way. How can I reshape this data using data.table to produce the expected outcome please.

My attempt and data to reproduce

library(data.table)
data = setDT(mtcars[7:11])

# data in
tdat = data[, lapply(.SD, function(y){
                      unlist(lapply(c(mean, median), function(f) f(y) ))
                   }),
                  by="am", .SDcols=seq_along(data)
              ]


# data out  
m = melt(tdat, id.vars="am")
m[, r:=duplicated(interaction(am, variable))+0L]      
dcast(m, variable + r ~ am, value.var = "value")[, r:=NULL][]

I asked a similar question but using the solution by Akrun, given in the comments, returns

dcast( melt(tdat, id.var=1), variable~am, value.var='value')
#Aggregate function missing, defaulting to 'length'
#   variable 0 1
#1:     qsec 2 2
#2:       vs 2 2
#3:       am 2 2
#4:     gear 2 2
#5:     carb 2 2

This can be solved using data.table 's rowid() function:

library(data.table)
m <- melt(tdat, id.vars="am")
dcast(m, variable + rowid(am) ~ am)[, am := NULL][]
  variable 0 1 1: qsec 18.1831600 17.3600000 2: qsec 17.8200000 17.0200000 3: vs 0.3684211 0.5384615 4: vs 0.0000000 1.0000000 5: am 0.0000000 1.0000000 6: am 0.0000000 1.0000000 7: gear 3.2105260 4.3846150 8: gear 3.0000000 4.0000000 9: carb 2.7368420 2.9230770 10: carb 3.0000000 2.0000000 

Data

library(data.table)
tdat <- fread(
"# i    am     qsec        vs am     gear     carb
# 1:  1 17.36000 0.5384615  1 4.384615 2.923077
# 2:  1 17.02000 1.0000000  1 4.000000 2.000000
# 3:  0 18.18316 0.3684211  0 3.210526 2.736842
# 4:  0 17.82000 0.0000000  0 3.000000 3.000000", 
  drop = 1:2, colClasses = list(integer = c(3, 6))
)

Alternatively, the sample dataset can be produced in a more concise way without doubling the am column:

setDT(mtcars[7:11])[, lapply(.SD, function(y) c(mean(y), median(y))), by = am]
  am qsec vs gear carb 1: 1 17.36000 0.5384615 4.384615 2.923077 2: 1 17.02000 1.0000000 4.000000 2.000000 3: 0 18.18316 0.3684211 3.210526 2.736842 4: 0 17.82000 0.0000000 3.000000 3.000000 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM