I have a data frame that contains patient survival data. I have a column for time to last follow up and a column for time to death. If the patient died, the numerical value of time will be listed in the time to death column, and not in the time to last follow up column; and vice versa if the patient is still alive. The opposite column, so if the patient is alive I am referring to the death column, there is a "[Not Available]" character string instead of a time component. Here is an example:
follow up death
100 [Not Available]
[Not Available] 300
2000 [Not Available]
I want to conditionally merge the two columns into a single column keeping just the numerical values like this:
Time
1000
300
2000
EDIT
To make this more broadly applicable, and applicable to some other datasets I have, imagine if the "[Not Available]" is not consistent. In that it could be NA, na, [Not available], null, etc. How would I write a conditional statement to merge the columns in this case? Im imagining an if statement that will keep numerical values and ignore the various character strings. Of course, in a column of a dataframe, both the numerical and character values will be classified as characters, making this just a little bit harder. Ideas?
We can use coalesce
from the dplyr
package.
library(dplyr)
dt <- data_frame("follow up" = c(1000, NA, 2000),
"death" = c(NA, 300, NA))
dt2 <- dt %>%
mutate(Time = coalesce(.$`follow up`, .$death))
dt2
# A tibble: 3 x 3
`follow up` death Time
<dbl> <dbl> <dbl>
1 1000 NA 1000
2 NA 300 300
3 2000 NA 2000
Here is an option with base R
dt$Time <- do.call(pmax, c(dt, na.rm = TRUE))
dt$Time
#[1] 1000 300 2000
You can use dplyr
's vectorized if_else
function to acheive the effect that you need. Here is the doc page.
Try the below:
library(tidyverse)
t1 <- data_frame("follow up" = c(1000, NA, 2000),
"death" = c(NA, 300, NA))
t2 <- t1 %>%
mutate(Time = if_else(death != 'NA', death, follow_up))
Result:
follow_up death Time
<chr> <chr> <chr>
1 100 NA 100
2 NA 300 300
3 2000 NA 2000
This answer does not use logical operators or if statements (if you can provide an answer that does, I would greatly appreciate it), but it works:
Data2$followup <- gsub("[Not Available]", "", Data2$followup)
Data2$death <- gsub("[Not Available]", "", Data2$death)
Data2$time <- paste(Data2$followup, Data2$death, sep = "")
Data2$time <- gsub("\\[", "", gsub("\\]", "", Data2$time))
Converting them to numeric and replacing NA with 0 and an arithmetic sum should give the desired output.
> ss <- data.frame(follow_up = c('100','[Not Available]','2000'),death = c('[Not Available]','300','[Not Available]'))
>
> ss <- lapply(ss, function(x){ifelse(x == '[Not Available]', 0, as.numeric(x))})
Warning messages:
1: In ifelse(x == "[Not Available]", 0, as.numeric(x)) :
NAs introduced by coercion
2: In ifelse(x == "[Not Available]", 0, as.numeric(x)) :
NAs introduced by coercion
>
> ss$new <- ss$follow_up + ss$death
>
> data.frame(ss)
follow_up death new
1 100 0 100
2 0 300 300
3 2000 0 2000
>
Use apply:
df <- data.frame("follow up" = c("1000", "[Not Available]", "2000"),
"death" = c("[Not Available]", "300", "[Not Available]"))
df$Time <- apply(df, 1, function(row) as.numeric(row[row!="[Not Available]"]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.