简体   繁体   中英

Applying the same factor levels to multiple variables in an R data frame

I am working with a dataset that includes 16 questions where the response set is identical (Yes, No, Unknown or Missing). I am processing the data using R and I want to turn each of the variables into a factor. For a single variable, I could use the following construction:

df <- read.csv("thedata.csv")
df$q1 <- factor(x=df$q1,levels=c(-9,0,1),
                        labels=c("Unknown or Missing","No","Yes))

I'd like to avoid typing that 16 times. I could do it with a for() , but I was wondering if there is a clearer, more R way to do it. Some sample data:

structure(list(q1 = c(0, 0, 0, -9, 0), q2 = c(0, 0, 1, 0, 0),
               q3 = c(0, 0, 1, 0, 0), q4 = c(1, 1, 0, 0, 0),
               q5 = c(0, 1, 1, 1, 1), q6 = c(1, 1, 1, 0, 0),
               q7 = c(0, 0, 0, 1, 0), q8 = c(0, 0, 1, 1, 1),
               q9 = c(1, 0, -9, 1, 0), q10 = c(1, 0, 0, 0, 0),
               q11 = c(0, 1, 1, 0, 0), q12 = c(1, 1, 0, 0, 0),
               q13 = c(1, -9, 1, 0, 0), q14 = c(0, 0, 0, 1, 1),
               q15 = c(1, 0, 1, 1, 0), q16 = c(1, 1, 1, 1, 1)),
               .Names = c("q1", "q2", "q3", "q4", "q5", "q6", "q7",
                          "q8", "q9", "q10", "q11", "q12", "q13",
                          "q14", "q15", "q16"),
               row.names = c(NA, -5L), class = "data.frame")
df[] <- lapply(df, factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
str(df)

Likely to be faster than apply or sapply which need data.frame to reform/reclass those results. The trick here is that using [] on the LHS of the assignment preserves the structure of the target (because R "knows" what its class and dimensions are, and the need for data.frame on the list from lapply is not needed. If you had wanted to do this only with selected columns you could do this:

 df[colnums] <- lapply(df[colnums], factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
 str(df)

An R base solution using apply

 data.frame(apply(df, 2, factor, 
                 levels=c(-9, 0, 1), 
                 labels = c("Unknown or Missing", "No", "Yes")))

Using sapply

data.frame(sapply(df, factor, levels=c(-9, 0, 1), 
         labels = c("Unknown or Missing", "No", "Yes")))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM