简体   繁体   中英

R: What is an efficient way to recode variables? How do I prorate means?

I was wondering if anyone could point me in the direction of how I would go about recoding multiple variables with the same rules. I have the following df bhs1 :

structure(list(bhs1_1 = c(NA, 1, NA, 2, 1, 2), bhs1_2 = c(NA, 
2, NA, 2, 1, 1), bhs1_3 = c(NA, 1, NA, 2, 2, 2), bhs1_4 = c(NA, 
2, NA, 1, 1, 1), bhs1_5 = c(NA, 1, NA, 1, 2, 2), bhs1_6 = c(NA, 
1, NA, 2, 1, 2), bhs1_7 = c(NA, 1, NA, 1, 2, 1), bhs1_8 = c(NA, 
2, NA, 2, 2, 2), bhs1_9 = c(NA, 1, NA, 2, 1, 1), bhs1_10 = c(NA, 
2, NA, 1, 2, 2), bhs1_11 = c(NA, 2, NA, 2, 2, 1), bhs1_12 = c(NA, 
2, NA, 2, 1, 1), bhs1_13 = c(NA, 1, NA, 1, 2, 2), bhs1_14 = c(NA, 
2, NA, 2, 1, 1), bhs1_15 = c(NA, 1, NA, 2, 2, 2), bhs1_16 = c(NA, 
2, NA, 2, 2, 2), bhs1_17 = c(NA, 2, NA, 2, 2, 1), bhs1_18 = c(NA, 
1, NA, 1, 2, 1), bhs1_19 = c(NA, 1, NA, 2, 1, 2), bhs1_20 = c(NA, 
2, NA, 2, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame")) 

There are two transformation rules, for half of the data set, eg,:

(bhs1_2, bhs1_4, bhs1_7, bhs1_9, bhs1_11, bhs1_12, bhs1_14, bhs1_16, bhs1_17, 
bhs1_18, bhs1_20) 
(if_else(1, 1, 0))

and 

(bhs1_1, bhs1_3, bhs1_5, bhs1_6, bhs1_8, bhs1_10, bhs1_13, 
bhs1_15, bhs1_19)
(if_else(2, 1, 0))

Is there an elegant way to write code to meet this use case? If so, can someone please point me in the right direction and/or provide me with a sample?

Here's a solution using dplyr

library(dplyr)
case1 <- vars(bhs1_2, bhs1_4, bhs1_7, bhs1_9, bhs1_11, bhs1_12, bhs1_14, bhs1_16, bhs1_17, 
  bhs1_18, bhs1_20) 
case2 <- vars(bhs1_1, bhs1_3, bhs1_5, bhs1_6, bhs1_8, bhs1_10, bhs1_13, 
  bhs1_15, bhs1_19)
result <- df %>%
  mutate_at(case1, ~ (. == 1) * 1L) %>%
  mutate_at(case2, ~ (. == 2) * 1L)

Note - I skipped the ifelse statement - I'm just testing for your condition, then converted the TRUE / FALSE responses to numbers by multiplying by 1. I'm also not sure how you want NAs to be handled, but this is ignoring them.

If you aren't familiar with the pipe operator ( %>% ), it takes the result of the previous function, and sets it as the first argument of the next function. It's designed to improve code legibility by avoiding lots of function nesting.

We can create the column names of interest, then convert to binary ( as.integer ) from the logical expression

case1 <- c("bhs1_2", "bhs1_4", "bhs1_7", "bhs1_9", "bhs1_11", "bhs1_12", 
   "bhs1_14", "bhs1_16", "bhs1_17", "bhs1_18", "bhs1_20") 

case2 <-  c("bhs1_1", "bhs1_3", "bhs1_5", "bhs1_6", "bhs1_8", 
   "bhs1_10", "bhs1_13", "bhs1_15", "bhs1_19")
library(magrittr)
df1 %<>%
    mutate_at(vars(case1), funs(as.integer(.==1 ))) %<>%
    mutate_at(vars(case2), funs(as.integer(.==2)))

df1
# A tibble: 6 x 20
#  bhs1_1 bhs1_2 bhs1_3 bhs1_4 bhs1_5 bhs1_6 bhs1_7 bhs1_8 bhs1_9 bhs1_10
#   <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>   <int>
#1     NA     NA     NA     NA     NA     NA     NA     NA     NA      NA
#2      0      0      0      0      0      0      1      1      1       1
#3     NA     NA     NA     NA     NA     NA     NA     NA     NA      NA
#4      1      0      1      1      0      1      1      1      0       0
#5      0      1      1      1      1      0      0      1      1       1
#6      1      1      1      1      1      1      1      1      1       1
# ... with 10 more variables: bhs1_11 <int>, bhs1_12 <int>, bhs1_13 <int>,
#   bhs1_14 <int>, bhs1_15 <int>, bhs1_16 <int>, bhs1_17 <int>, bhs1_18 <int>,
#   bhs1_19 <int>, bhs1_20 <int>

Or an efficient option would be to use data.table

library(data.table)
setDT(df1)[, (case1) := lapply(.SD, function(x) as.integer(x == 1 )),
  .SDcols = case1
      ][, (case2) := lapply(.SD, function(x) as.integer(x == 2)), 
  .SDcols = case2][]

NOTE This doesn't assume that all the values are of the same

You can use a very fast base R way of doing this as below:

case1=c("bhs1_10", "bhs1_11", "bhs1_12", "bhs1_13", "bhs1_14", "bhs1_15","bhs1_16", "bhs1_17", "bhs1_18", "bhs1_19", "bhs1_20")  

case2=c("bhs1_1", "bhs1_3", "bhs1_5", "bhs1_6", "bhs1_8", "bhs1_10", "bhs1_13", "bhs1_15", "bhs1_19")

dat[case1]=abs(dat[case1]-2)
dat[case2]=dat[case2]-1

An simple ifelse can be helpful considering OP wants NA to be converted based on specified rules:

case1 = c("bhs1_2", "bhs1_4", "bhs1_7", "bhs1_9", "bhs1_11", "bhs1_12",
          "bhs1_14", "bhs1_16", "bhs1_17", "bhs1_18", "bhs1_20")

case2 = c("bhs1_1", "bhs1_3", "bhs1_5", "bhs1_6", "bhs1_8", "bhs1_10",
          "bhs1_13", "bhs1_15", "bhs1_19")


df[case1] = ifelse(!is.na(df[case1]) & df[case1]==1,1,0)
df[case2] = ifelse(!is.na(df[case2]) & df[case2]==2,1,0)

#Test solution
df[1:7]
#   bhs1_1 bhs1_2 bhs1_3 bhs1_4 bhs1_5 bhs1_6 bhs1_7
# 1      0      0      0      0      0      0      0
# 2      0      0      0      0      0      0      1
# 3      0      0      0      0      0      0      0
# 4      1      0      1      1      0      1      1
# 5      0      1      1      1      1      0      0
# 6      1      1      1      1      1      1      1

**Updated:**If NA to be left as is then solution can be:

df[case1] = ifelse(df[case1]==1,1,0)
df[case2] = ifelse(df[case2]==2,1,0)


df[1:7]
#   bhs1_1 bhs1_2 bhs1_3 bhs1_4 bhs1_5 bhs1_6 bhs1_7
# 1     NA     NA     NA     NA     NA     NA     NA
# 2      0      0      0      0      0      0      1
# 3     NA     NA     NA     NA     NA     NA     NA
# 4      1      0      1      1      0      1      1
# 5      0      1      1      1      1      0      0
# 6      1      1      1      1      1      1      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM