简体   繁体   中英

How do I replace values across multiple columns in a data-frame with values from a second column, based on a match with a third column using R?

I am working with a single dataframe in R containing the following char columns and values.

C1<-c("1","2","3","4","5")
C2<-c("x", "t", "u", "r", "j")
C3<-c("2","5","3","1","4")
C4<-c("3","1","NA", "2","5")
df<-data.frame(C1,C2,C3,C4)

I am trying to write code that will replace values in C3 and C4 as follows:

  1. For each value in C3, find the same value in C1.
  2. Replace the value in C3 with the value in C2 that occurs in the row with the C3/C1 match. In C3, For example, "2" (the first value) would be replaced with "t", "5" would be replaced with "j", "3" would be replaced with "3" and so forth.
  3. Repeat the same procedure for values in C4.
  4. Skip any cells with an NA in C3 or C4.

The initial dataframe looks like this:

初始数据框

The final dataframe should look like this:

更新的数据框

I've yet to come up with code (base R or Dplyr) that will accomplish this task. If anyone can lend assistance, I would really appreciate it.

Thanks!

This is a new df that I've tried to manipulate with the code provided by respondents (eg, df[c("C3", "C4")] <- lapply(df[c("C3", "C4")], function(x) df$C2[match(x, df$C1)])).

I am returning all NA's for C3 C4 and cannot understand why. There are matches between C3 and C1.

在此处输入图像描述

We can use match

df[c("C3", "C4")] <- lapply(df[c("C3", "C4")], function(x) df$C2[match(x, df$C1)])

I also used match , but split it up into two different statements to make it more clear what was going on:

# Create sample data
C1<-c("1","2","3","4","5")
C2<-c("x", "t", "u", "r", "j")
C3<-c("2","5","3","1","4")
C4<-c("3","1","NA", "2","5")
df<-data.frame(C1,C2,C3,C4)

# Make replacements
df$C3_mod <- ifelse(is.na(df$C3), df$C3, df$C2[match(df$C3, df$C1)])
df$C4_mod <- ifelse(is.na(df$C4), df$C4, df$C2[match(df$C4, df$C1)])

# View results
df
#   C1 C2 C3 C4 C3_mod C4_mod
# 1  1  x  2  3      t      u
# 2  2  t  5  1      j      x
# 3  3  u  3 NA      u   <NA>
# 4  4  r  1  2      x      t
# 5  5  j  4  5      r      j

Using match with matrix.

cols <- c('C3', 'C4')
df[cols] <- df$C2[match(as.matrix(df[cols]), df$C1)]
df

#  C1 C2 C3   C4
#1  1  x  t    u
#2  2  t  j    x
#3  3  u  u <NA>
#4  4  r  x    t
#5  5  j  r    j

I solved the issue of my NA values. It turns out that I had whitespaces in the column values that I hadn't accounted for. Again, thanks to everyone for their responses. I learned a lot in the process.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM