简体   繁体   中英

conditionally merge columns, r

I have a data frame that contains patient survival data. I have a column for time to last follow up and a column for time to death. If the patient died, the numerical value of time will be listed in the time to death column, and not in the time to last follow up column; and vice versa if the patient is still alive. The opposite column, so if the patient is alive I am referring to the death column, there is a "[Not Available]" character string instead of a time component. Here is an example:

    follow up           death
       100         [Not Available]
 [Not Available]         300
      2000         [Not Available]

I want to conditionally merge the two columns into a single column keeping just the numerical values like this:

Time
1000
300
2000

EDIT

To make this more broadly applicable, and applicable to some other datasets I have, imagine if the "[Not Available]" is not consistent. In that it could be NA, na, [Not available], null, etc. How would I write a conditional statement to merge the columns in this case? Im imagining an if statement that will keep numerical values and ignore the various character strings. Of course, in a column of a dataframe, both the numerical and character values will be classified as characters, making this just a little bit harder. Ideas?

We can use coalesce from the dplyr package.

library(dplyr)

dt <- data_frame("follow up" = c(1000, NA, 2000),
                 "death" = c(NA, 300, NA))

dt2 <- dt %>%
  mutate(Time = coalesce(.$`follow up`, .$death))

dt2
# A tibble: 3 x 3
  `follow up` death  Time
        <dbl> <dbl> <dbl>
1        1000    NA  1000
2          NA   300   300
3        2000    NA  2000

Here is an option with base R

dt$Time <- do.call(pmax, c(dt, na.rm = TRUE))
dt$Time
#[1] 1000  300 2000

You can use dplyr 's vectorized if_else function to acheive the effect that you need. Here is the doc page.

Try the below:

library(tidyverse)

t1 <- data_frame("follow up" = c(1000, NA, 2000),
             "death" = c(NA, 300, NA))

t2 <- t1 %>%
  mutate(Time = if_else(death != 'NA', death, follow_up))

Result:
  follow_up death Time
      <chr> <chr>  <chr>
1       100    NA    100
2        NA   300    300
3      2000    NA   2000

This answer does not use logical operators or if statements (if you can provide an answer that does, I would greatly appreciate it), but it works:

Data2$followup <- gsub("[Not Available]", "", Data2$followup)
Data2$death <- gsub("[Not Available]", "", Data2$death)
Data2$time <- paste(Data2$followup, Data2$death, sep = "")
Data2$time <- gsub("\\[", "", gsub("\\]", "", Data2$time))

Converting them to numeric and replacing NA with 0 and an arithmetic sum should give the desired output.

> ss <- data.frame(follow_up = c('100','[Not Available]','2000'),death = c('[Not Available]','300','[Not Available]'))
> 
> ss <- lapply(ss, function(x){ifelse(x == '[Not Available]', 0, as.numeric(x))})
Warning messages:
1: In ifelse(x == "[Not Available]", 0, as.numeric(x)) :
  NAs introduced by coercion
2: In ifelse(x == "[Not Available]", 0, as.numeric(x)) :
  NAs introduced by coercion
> 
> ss$new <- ss$follow_up + ss$death
> 
> data.frame(ss)
  follow_up death  new
1       100     0  100
2         0   300  300
3      2000     0 2000
> 

Use apply:

df <- data.frame("follow up" = c("1000", "[Not Available]", "2000"),
                 "death"     = c("[Not Available]", "300", "[Not Available]"))

df$Time <- apply(df, 1, function(row) as.numeric(row[row!="[Not Available]"]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM