简体   繁体   中英

concatenating column names with column data in R (using data.table)

I have a data.table as follows,

library(data.table)

dt<-structure(list(varx = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L
), vary = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L)), class = c("data.table", 
"data.frame"), row.names = c(NA, -10L))
dt
    varx vary
 1:    0    0
 2:    1    0
 3:    0    0
 4:    0    0
 5:    1    1
 6:    0    0
 7:    1    1
 8:    0    0
 9:    0    0
10:    0    0

and I am trying to get the following output:

dt 
    varx    vary
1:  varx_n  vary_n
2:  varx_y  vary_n
3:  varx_n  vary_n
4:  varx_n  vary_n
5:  varx_y  vary_y
6:  varx_n  vary_n
7:  varx_y  vary_y
8:  varx_n  vary_n
9:  varx_n  vary_n
10: varx_n  vary_n

using the following code:

dt[,lapply(.SD, function(x){
  ifelse(x==1,paste0(.SD,"_y"),paste0(.SD,"_n"))
})]

However, I am not getting the desired output. Please help.

Use Map and a bit of factor labelling to pair each variable name with the n/y label required.

dt[, Map(paste, names(dt), lapply(.SD,factor,labels=c("n","y")), sep="_")]

#      varx   vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
#10: varx_n vary_n

The following works:

dt[ , lapply(setNames(nm = names(.SD)), function(nm_j) 
  sprintf('%s_%s', nm_j, c('n', 'y')[.SD[[nm_j]] + 1L]))]
#       varx   vary
#  1: varx_n vary_n
#  2: varx_y vary_n
#  3: varx_n vary_n
#  4: varx_n vary_n
#  5: varx_y vary_y
#  6: varx_n vary_n
#  7: varx_y vary_y
#  8: varx_n vary_n
#  9: varx_n vary_n
# 10: varx_n vary_n

The problem with your approach is that, in lapply(.SD, ...) , in the scope of FUN the name of the current list element (ie, the column name) is unknown. To get around this, we loop over column names whereby we can give ourselves access to both the column names and the contents of the columns.

The setNames part is just for convenience, it can easily be broken out if you find it too code-golfy -- it will create an object c(varx = 'varx', vary = 'vary') , which lets the output automatically get the right names. If we do lapply(names(.SD), ...) , we'll have to clean up the column names afterwards.

c('n', 'y')[idx + 1L] is a bit of a murky way of saying ifelse(idx, 'y', 'n') (one of the places where 0-based indexing would be nice); it can be replaced with that as you see fit. If your data is massive, you'll notice my version is faster .

in base R :

dt[dt==0] <- "_n" 
dt[dt=="1"] <- "_y" 
dt[] <- Map(paste0,names(dt),dt)
#       varx   vary
#  1: varx_n vary_n
#  2: varx_y vary_n
#  3: varx_n vary_n
#  4: varx_n vary_n
#  5: varx_y vary_y
#  6: varx_n vary_n
#  7: varx_y vary_y
#  8: varx_n vary_n
#  9: varx_n vary_n
# 10: varx_n vary_n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM