简体   繁体   中英

Replace all specific values in data.frame with values from another data.frame sequentially R

I have a data.frame (df1) and I want to include a single, most recent age for each of my samples from another data.frame (df2):

df1$age <- df2$age_9[match(df1$Sample_ID, df2$Sample_ID)]

The problem is that in df2 there are 9 columns for age, as each one indicates the age at a specific check-up date (age_1 is from the first visit, age_9 is the age at the 9th visit) and patients dont make all their visits.

How do I add the most recently obtained age from a non empty check up date?

aka, if age_9 == "." replace "." with age_8 then if age_8 == "." replace "." with age_7... etc

From this:

View(df1)
Sample Age
1      50
2      .
3      .

To:

View(df1)
Sample Age
1      50
2      49
3      30

From the data df2

View(df2)
Sample Age_1 Age_2 Age_3
1      40    42    44
2      35    49    .
3      30    .     .

This is my attempt:

df1$age[which(df1$age == ".")] <- df2$age_8[match(df1$Sample_ID, df2$Sample_ID)]

With base R , we can use max.col to return the last column index for each row, where the 'Age' columns are not . , cbind with sequence of rows to return a row/column index, extract the elements and change the 'Age' column in 'df1', where the 'Age' is .

df1$Age <- ifelse(df1$Age == ".", df2[-1][cbind(seq_len(nrow(df2)), 
        max.col(df2[-1] != ".", "last"))], df1$Age)

df1 <- type.convert(df1, as.is = TRUE)

-output

df1
#  Sample Age
#1      1  50
#2      2  49
#3      3  30

or using tidyverse by reshaping into 'long' format and then do a join after slice ing the last row grouped by 'Sample'

library(dplyr)
library(tidyr)
df2 %>% 
    mutate(across(starts_with('Age'), as.integer)) %>%
    pivot_longer(cols = starts_with('Age'), values_drop_na = TRUE) %>%
    group_by(Sample) %>% 
    slice_tail(n = 1) %>% 
    ungroup %>% 
    select(-name) %>%
    right_join(df1) %>%
    transmute(Sample, Age = coalesce(as.integer(Age), value))

-output

# A tibble: 3 x 2
#  Sample   Age
#   <int> <int>
#1      1    50
#2      2    49
#3      3    30

data

df1 <- structure(list(Sample = 1:3, Age = c("50", ".", ".")), 
       class = "data.frame",
  row.names = c(NA, 
-3L))

df2 <- structure(list(Sample = 1:3, Age_1 = c(40L, 35L, 30L), Age_2 = c("42", 
"49", "."), Age_3 = c("44", ".", ".")), class = "data.frame", 
row.names = c(NA, 
-3L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM