简体   繁体   中英

Conditionally replace matching values from one data.frame to values in another data.frame

I have a df1 that looks like

[df1]

DE201 DE207
A15 A903
A18 A906

...

df1 <- data.frame(DE201 = c("A15", "A18"),
                  DE207 = c("A903", "A906"))

and another df2 that has the dictionary values for those values, row wise

[df2]

module Data_Element_ID Data_Element_Name Answer_Code Answer_Value DE_original
Fall DE201 Injury result A15 Yes DE201_A15
Fall DE201 Injury result A18 No DE201_A18
Fall DE207 Patient activity A903 Ambulating with assistance DE207_A903
Fall DE207 Patient activity A906 Intracranial injury DE207_A906

...

df2 <- data.frame(module = rep("Fall", 4),
                  Data_Element_ID = c(rep("DE201", 2), rep("DE207", 2)),
                  Data_Element_Name = c(rep("Injury result", 2), rep("Patient activity", 2)),
                  Answer_Code = c("A15", "A18", "A903", "A906"),
                  Answer_value = c("Yes", "No", "Ambulating with assistance", "Interracial injury"),
                  DE_original = c("DE201_A15", "DE201_A18", "DE207_A903", "DE207_A906")
                  )

I want to replace all the values in df1, including its column name, to be the matching values in df2 (df1 column names to be matched df2's Data_Element_Name, and df1 values to be matched df2's Answer_Value)

for example, my desired output is something like

Injury result Patient activity
Yes Ambulating with assistance
No Intracranial injury

...

output <- data.frame(Injury_result = c("Yes", "No"),
                    Patient_activity = c("Ambulating with assistance", "Interracial injury"))

I've tried below two methods but none of them worked, and merge/join are not really applicable here because they don't really have the same key/ID to join/merge by, since one is column wise elements, and one is row wise elements. Tried mutate with if else and case_when:

mutate(
    DE201 = case_when(DE201 == df2$Answer_Code, ~df2$Answer_Value, 
                      TRUE ~DE201
                     ),
    DE204 = ifelse(DE204 %in% df2$Answer_Code, df2$Answer_Value, DE204)
)

There might be a more direct way of doing this, but it's not a straightforward task. In this case, I reshaped the data from df1 to a long format using pivot_longer() in order to tie in the data from df2 using left_join() . Then, it's a question of reshaping the data back to a wide format using pivot_wider() for those columns of data you wanted to keep.

Note that I'm adding a row id that I eventually discard because we need to keep track of what lines of data goes where when we are reshaping from wide to long back to wide.

library(dplyr)
library(tidyr)

df1 |>
  mutate(row_id = row_number()) |>
  pivot_longer(-row_id, names_to = "Data_Element_ID", values_to = "Answer_Code") |>
  left_join(df2) |>
  select(row_id, Data_Element_Name, Answer_value) |>
  pivot_wider(names_from = Data_Element_Name, values_from = Answer_value) |>
  select(-row_id)

# A tibble: 2 x 2
  `Injury result` `Patient activity`        
  <chr>           <chr>                     
1 Yes             Ambulating with assistance
2 No              Interracial injury  

Here is a solution with data.table . We create a dictionary using data.table and extracting the key/val pairs. For example, the "Answer_Code" column is mapped to the "Answer_Value" column. Same for "Data_Element_ID" and "Data_Element_Name". I change the names to key and val for clarity.

library(data.table) 
dict <- rbindlist(list(df2[,c("Answer_Code", "Answer_Value")],
                       df2[,c("Data_Element_ID", "Data_Element_Name")]))
colnames(dict) <- c('key', 'val')

The dictionary "dict" looks like this:

     key                        val
1:   A15                        Yes
2:   A18                         No
3:  A903 Ambulating with assistance
4:  A906        Intracranial injury
5: DE201              Injury result
6: DE201              Injury result
7: DE207           Patient activity
8: DE207           Patient activity

Now we can create a function which will take in a column, and replace the code/key with the name/val. We use apply with the margin of 2 (to iterate over columns) for this.

match_function <- function(column){
  dict[match(column, dict$key), MARGIN = 2, drop=F]
}

df3 <- as.data.frame(apply(df1, 2, match_function))

df3 at this point looks like this:

  val                      val.1
1 Yes Ambulating with assistance
2  No        Intracranial injury

Now we just need to change the column names using the same dictionary process.

names(df3) <- dict[match(names(df1), dict$key), 2, drop=F][[1]]

df3

Our results match what you expected.

  Injury result           Patient activity
1           Yes Ambulating with assistance
2            No        Intracranial injury

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM