Recoding a large number of variables using another data frame in R

Question

I'd like to use a data frame (Df2) to recode the variables of another data frame (Df1), so that the end result is a data frame that contains text like local/international rather than 1s/2s (Df3). Missingness is present in the Df1 data frame, and I'd like to make sure it's represented as NA.

This is a minimal working example, the actual data set contains more than a hundred variables (all of which are of the character class) with between one and fifteen levels. Any help would be much appreciated.

Starting point (dfs)

Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),"seller_Q2"=c(2,1,3,2),"price_Q1_2"=c(2,5,7,5))
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),"VariableLevel"=c(1,2,1,2,3,2,5,7),"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"))

Desired outcome (df)

Df3 <- data.frame("buyer_Q1"=c("local","internat","local","local"),"seller_Q2"=c("internat","local","NA","internat"),"price_Q1_2"=c("50-100K","100-200K","200+K","100-200K"))

Thoughts, not really code, so far: (If there's a match between a row of the df2 NameOfVariable and a df1 variable name, as well as a match between a row of df2 VariableLevel and a df1 observation, then paste the corresponding row of df2 VariableDef into df1. Wondering if you can use if statements for it.)

if (Df2["NameOfVariable"]==names(Df1))
{
  if (Df2["VariableLevel"]==Df1[ ])
  {
   Df1[ ] <- paste0("VariableDef") 
  }
}

Answer 1

Here is on method in base R using match and Map . Map applies a function to corresponding list elements. Here, there are two list elements: Df1 and a list that is composed of the second and third columns of Df2, split by column 1. The second list is reordered to match the order of the names in Df1.

The applied function matches elements in a column Df1 to the corresponding column in the second argument and uses it as an index to return the corresponding name of the Df2 argument. Map returns a list, which is converted to a data.frame with the function of the same name.

data.frame(Map(function(x, y) y[[2]][match(x, y[[1]])],
               Df1,
               split(Df2[2:3], Df2[1])[names(Df1)]))

this returns

  buyer_Q1 seller_Q2 price_Q1_2
1    local  internat    50-100K
2 internat     local   100-200K
3    local        NA      200+K
4    local  internat   100-200K

Answer 2

Solution using loop and factors. Be careful. Results seem equivalent but they are not. The function fun return data frame with factors. If needed you can convert them to characters.

Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),"seller_Q2"=c(2,1,3,2),"price_Q1_2"=c(2,5,7,5))
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),"VariableLevel"=c(1,2,1,2,3,2,5,7),"VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"))
Df3 <- data.frame("buyer_Q1"=c("local","internat","local","local"),"seller_Q2"=c("internat","local","NA","internat"),"price_Q1_2"=c("50-100K","100-200K","200+K","100-200K"))

fun <- function(df, mdf) {
  for (varn in names(df)) {
    dat <- mdf[mdf$NameOfVariable == varn & !is.na(mdf$VariableDef),]
    df[[varn]] <- factor(df[[varn]], dat$VariableLevel, dat$VariableDef)
  }
  return(df)
}

fun(Df1, Df2)
Df3

Answer 3

A solution from dplyr and tidyr . The code will work fine even with warning messages because the columns are in factor. If you don't want to see any warning messages, set stringsAsFactors = FALSE when creating the data frame like the example I provided.

library(dplyr)
library(tidyr)

Df3 <- Df1 %>%
  mutate(ID = 1:n()) %>%
  gather(NameOfVariable, VariableLevel, -ID) %>%
  left_join(Df2, by = c("NameOfVariable", "VariableLevel")) %>%
  select(-VariableLevel) %>%
  spread(NameOfVariable, VariableDef) %>%
  select(-ID)

Df3
  buyer_Q1 price_Q1_2 seller_Q2
1    local    50-100K  internat
2 internat   100-200K     local
3    local      200+K        NA
4    local   100-200K  internat

DATA

Df1 <- data.frame("buyer_Q1"=c(1,2,1,1),
                  "seller_Q2"=c(2,1,3,2),
                  "price_Q1_2"=c(2,5,7,5),
                  stringsAsFactors = FALSE)
Df2 <- data.frame("NameOfVariable"=c("buyer_Q1","buyer_Q1","seller_Q2","seller_Q2","seller_Q2","price_Q1_2","price_Q1_2","price_Q1_2"),
                  "VariableLevel"=c(1,2,1,2,3,2,5,7),
                  "VariableDef"=c("local","internat","local","internat","NA","50-100K","100-200K","200+K"),
                  stringsAsFactors = FALSE)

Recoding a large number of variables using another data frame in R

Question

3 answers

solution1
1 ACCPTED 2017-10-03 13:46:59

solution2
0 2017-10-03 13:59:22

solution3
0 2017-10-03 14:43:26

Recoding a large number of variables using another data frame in R

Question

3 answers

solution1 1 ACCPTED 2017-10-03 13:46:59

solution2 0 2017-10-03 13:59:22

solution3 0 2017-10-03 14:43:26

solution1
1 ACCPTED 2017-10-03 13:46:59

solution2
0 2017-10-03 13:59:22

solution3
0 2017-10-03 14:43:26