简体   繁体   中英

How to combine two columns in a dataframe in R?

I have a dataframe "df" like below:

Samples Status  last_contact_days_to    death_days_to
Sample1 Alive   [Not Available]       [Not Applicable]
Sample2 Dead    [Not Available]             724
Sample3 Dead    [Not Available]            1624
Sample4 Alive      1569               [Not Applicable]
Sample5 Dead    [Not Available]            2532
Sample6 Dead    [Not Available]            1271

I want to combine columns last_contact_days_to and death_days_to where in the result it should show only values not any other characters. And if both the columns has characters it should remove the whole row.

The result should look like following:

Samples Status  new_column
Sample2 Dead    724
Sample3 Dead    1624
Sample4 Alive   1569
Sample5 Dead    2532
Sample6 Dead    1271

We can change the [Not Available] and [Not Applicable] to NA and use coalesce

library(tidyverse)
df1 %>%
   mutate_at(3:4, 
      funs(replace(., .%in% c("[Not Available]", "[Not Applicable]"), NA))) %>%
   transmute(Samples, Status,
             new_column = coalesce(last_contact_days_to, death_days_to)) %>%
   filter(!is.na(new_column))
#  Samples Status new_column
#1 Sample2   Dead        724
#2 Sample3   Dead       1624
#3 Sample4  Alive       1569
#4 Sample5   Dead       2532
#5 Sample6   Dead       1271

Note: As @Roland suggested, if the columns 3 and 4 have only numeric values in addition to the '[Not Available]', '[Not Applicable]', then the mutate_at can be changed to as.numeric . It will convert all non-numeric elements to NA with a friendly warning and it would not have any problems

df1 %>%
    mutate_at(3:4, as.numeric) 
    # if the columns are `factor` class then wrap with `as.character`
    # mutate_at(3:4, funs(as.numeric(as.character(.))))

NOTE: In the OP's dataset, these are factor class. So, uncomment the code above and use that instead of directly applying as.numeric

data

df1 <- structure(list(Samples = c("Sample1", "Sample2", "Sample3", "Sample4", 
"Sample5", "Sample6"), Status = c("Alive", "Dead", "Dead", "Alive", 
"Dead", "Dead"), last_contact_days_to = c("[Not Available]", 
"[Not Available]", "[Not Available]", "1569", "[Not Available]", 
"[Not Available]"), death_days_to = c("[Not Applicable]", "724", 
"1624", "[Not Applicable]", "2532", "1271")), .Names = c("Samples", 
"Status", "last_contact_days_to", "death_days_to"), 
 class = "data.frame", row.names = c(NA, 
-6L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM