I have a data.table as follows,
library(data.table)
dt<-structure(list(varx = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L
), vary = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L)), class = c("data.table",
"data.frame"), row.names = c(NA, -10L))
dt
varx vary
1: 0 0
2: 1 0
3: 0 0
4: 0 0
5: 1 1
6: 0 0
7: 1 1
8: 0 0
9: 0 0
10: 0 0
and I am trying to get the following output:
dt
varx vary
1: varx_n vary_n
2: varx_y vary_n
3: varx_n vary_n
4: varx_n vary_n
5: varx_y vary_y
6: varx_n vary_n
7: varx_y vary_y
8: varx_n vary_n
9: varx_n vary_n
10: varx_n vary_n
using the following code:
dt[,lapply(.SD, function(x){
ifelse(x==1,paste0(.SD,"_y"),paste0(.SD,"_n"))
})]
However, I am not getting the desired output. Please help.
Use Map
and a bit of factor
labelling to pair each variable name with the n/y
label required.
dt[, Map(paste, names(dt), lapply(.SD,factor,labels=c("n","y")), sep="_")]
# varx vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
#10: varx_n vary_n
The following works:
dt[ , lapply(setNames(nm = names(.SD)), function(nm_j)
sprintf('%s_%s', nm_j, c('n', 'y')[.SD[[nm_j]] + 1L]))]
# varx vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
# 10: varx_n vary_n
The problem with your approach is that, in lapply(.SD, ...)
, in the scope of FUN
the name of the current list element (ie, the column name) is unknown. To get around this, we loop over column names whereby we can give ourselves access to both the column names and the contents of the columns.
The setNames
part is just for convenience, it can easily be broken out if you find it too code-golfy -- it will create an object c(varx = 'varx', vary = 'vary')
, which lets the output automatically get the right names. If we do lapply(names(.SD), ...)
, we'll have to clean up the column names afterwards.
c('n', 'y')[idx + 1L]
is a bit of a murky way of saying ifelse(idx, 'y', 'n')
(one of the places where 0-based indexing would be nice); it can be replaced with that as you see fit. If your data is massive, you'll notice my version is faster .
in base R
:
dt[dt==0] <- "_n"
dt[dt=="1"] <- "_y"
dt[] <- Map(paste0,names(dt),dt)
# varx vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
# 10: varx_n vary_n
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.