简体   繁体   中英

In R is there a way to recode the columns from one data frame with values from another data frame?

I am still relatively new to working in R and I am not sure how to approach this problem. Any help or advice is greatly appreciated!!!

The problem I have is that I am working with two data frames and I need to recode the first data frame with values from the second. The first data frame (df1) contains the data from the respondents to a survey and the other data frame(df2) is the data dictionary for df1.

The data looks like this:

df1 <-  data.frame(a = c(1,2,3), 
            b = c(4,5,6), 
            c = c(7,8,9))

df2 <- data.frame(columnIndicator = c("a","a","a","b","b","b","c","c","c" ),
              df1_value = c(1,2,3,4,5,6,7,8,9),
              new_value = c("a1","a2","a3","b1","b2","b3","c1","c2","c3"))

So far I can manually recode df1 to get the expected output by doing this:

df1 <- within(df1,{
  a[a==1] <- "a1"
  a[a==2] <- "a2"
  a[a==3] <- "a3"
  b[b==4] <- "b4"
  b[b==5] <- "b5"
  b[b==6] <- "b6"
  c[c==7] <- "c7"
  c[c==8] <- "c8"
  c[c==9] <- "c9"
})

However my real dataset has about 42 columns that need to be recoded and that method is a little time intensive. Is there another way in R for me to recode the values in df1 with the values in df2?

Thanks!

Just need to transform the shape a bit.

library(data.table)
df1 <-  data.frame(a = c(1,2,3), 
                   b = c(4,5,6), 
                   c = c(7,8,9))

df2 <- data.frame(columnIndicator = c("a","a","a","b","b","b","c","c","c" ),
                  df1_value = c(1,2,3,4,5,6,7,8,9),
                  new_value = c("a1","a2","a3","b4","b5","b6","c7","c8","c9"),stringsAsFactors = FALSE)



setDT(df1)
setDT(df2)

df1[,ID:=.I]

ldf1 <- melt(df1,measure.vars = c("a","b","c"),variable.name = "columnIndicator",value.name = "df1_value")


ldf1[df2,"new_value":=i.new_value,on=.(columnIndicator,df1_value)]

ldf1
#>    ID columnIndicator df1_value new_value
#> 1:  1               a         1        a1
#> 2:  2               a         2        a2
#> 3:  3               a         3        a3
#> 4:  1               b         4        b4
#> 5:  2               b         5        b5
#> 6:  3               b         6        b6
#> 7:  1               c         7        c7
#> 8:  2               c         8        c8
#> 9:  3               c         9        c9

dcast(ldf1,ID~columnIndicator,value.var = "new_value")
#>    ID  a  b  c
#> 1:  1 a1 b4 c7
#> 2:  2 a2 b5 c8
#> 3:  3 a3 b6 c9

Created on 2020-04-18 by the reprex package (v0.3.0)

In base R, we can unlist df1 match it with df1_value and get corresponding new_value .

df1[] <- df2$new_value[match(unlist(df1), df2$df1_value)]
df1

#   a  b  c
#1 a1 b1 c1
#2 a2 b2 c2
#3 a3 b3 c3

Is this what you are looking for???

library(dplyr)
df3 <- df1 %>% gather(key = "key", value = "value")

df3 %>% inner_join(df2, by = c("key" = "columnIndicator", "value" = "df1_value"))

Output

  key value new_value
1   a     1        a1
2   a     2        a2
3   a     3        a3
4   b     4        b1
5   b     5        b2
6   b     6        b3
7   c     7        c1
8   c     8        c2
9   c     9        c3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM