I'm trying to wrap my head around how to use lapply
to recode several variables while pasting in the last value of the variable name into the string.
Building on this post , I know that I can recode several variables at a time:
d2 <- lapply(d1, FUN=function(X) recode(X, "'Somewhat interested' ='Somewhat'; 'Not interested' = 'No'"))
But, what I need to do is slightly different. Suppose that my data frame has sequentially labeled variables, eg var_1
, var_2
, var_3
and looks like this:
var_1 var_2 var_3 var_4
1:
2: Somewhat interested Somewhat interested Somewhat interested Not interested
3: Somewhat interested Somewhat interested Somewhat interested Not interested
4: Not interested Somewhat interested Somewhat interested Somewhat interested
I want to recode the variable and append the sequential identifier of the column name:
var_1 var_2 var_3 var_4
1:
2: Somewhat 1 Somewhat 2 Somewhat 3 No 4
3: Somewhat 1 Somewhat 2 Somewhat 3 No 4
4: No 1 Somewhat 2 Somewhat 3 Somewhat 4
Thoughts on how to combine recode
and paste
together?
You can use the column names themselves for the sapply()
(instead of lapply()
- i had to remake the data by hand so this works with the version i have).
So
d2 <- lapply(d1, FUN=function(X) recode(X, "'Somewhat interested' ='Somewhat'; 'Not interested' = 'No'"))
turns into
d2 <- sapply(colnames(d1), FUN=function(X) recode(d1[,X], "'Somewhat interested' ='Somewhat'; 'Not interested' = 'No'"))
where d1[,X]
is calling the column to apply the function to.
now to append the column suffix we can use paste0()
"'Somewhat interested' ='Somewhat'; 'Not interested' = 'No'"
is replaced by
paste0("'Somewhat interested' ='Somewhat ",X ,"'; 'Not interested' = 'No ", X,"'")
however this stil doesnt do exactly what you want since you will have the suffix and the prefix.
This means we need to then remove the prefix and we can use substr()
for that.
substr(X, 5, nchar(X))
all together now:
d2 <- sapply(colnames(d1), FUN=function(X) recode(d1[,X], paste0("'Somewhat interested' ='Somewhat ",substr(X, 5, nchar(X)) ,"'; 'Not interested' = 'No ", substr(X, 5, nchar(X)),"'")))
You can just use regex:
mtx1 <- sapply(seq_along(df), function(x){gsub('interested', x, df[,x])})
mtx1
# [,1] [,2] [,3] [,4]
# [1,] "Somewhat 1" "Somewhat 2" "Somewhat 3" "Not 4"
# [2,] "Somewhat 1" "Somewhat 2" "Somewhat 3" "Not 4"
# [3,] "Not 1" "Somewhat 2" "Somewhat 3" "Somewhat 4"
Admittedly it leaves "Not" instead of "No", but you can either use more complicated regex, or just change it separately:
apply(mtx1, 2, function(x){gsub('Not', 'No', x)})
# [,1] [,2] [,3] [,4]
# [1,] "Somewhat 1" "Somewhat 2" "Somewhat 3" "No 4"
# [2,] "Somewhat 1" "Somewhat 2" "Somewhat 3" "No 4"
# [3,] "No 1" "Somewhat 2" "Somewhat 3" "Somewhat 4"
Wrap with as.data.frame
(or your favorite version) if you need data.frames instead of matrices.
Note that if you data is in factors, it will be more efficient to run the same regex on the levels instead of the actual data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.