简体   繁体   中英

How to compare multiple variables using dplyr

At the moment I need a method to analyze data that I have and it would be of great help if you could collaborate with me. The data is shown as in the following example:

> glimpse(test)
Rows: 559
Columns: 4
$ Host.H <chr> "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Human", "Hu…
$ Host.I <chr> NA, "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermediate", "Intermedia…
$ Host.B <chr> NA, "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", "Bat", NA, "Bat", "Bat"…
$ Host.C <chr> NA, "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Consensus", "Co…

These data correspond to organisms derived from bats, an intermediate, human and replica (Host.B, Host.I, Host.H and Host.C). It can be found that they aren't complete in all the cells, there are some with unavailable data as NA Therefore, my goal is that if in all the variables there is data among (Host.B = Bat, Host.I = Intermediate, Host.H = Human and Host. C = Consensus) it is assigned to a new column called "type" as "Conserved", while if there is missing data among the variables (Host.B = NA, Host.I = Intermediate, Host.H = NA and Host.C = Consensus) it is identified as "Shared" and if there is only one data in the column (Host.B = Bat, Host.I = NA, Host.H = NA and Host.C = NA) as "Unique".

For this purpose I have designed the following script:

test <- data %>%
  rowwise() %>%
  mutate(Type = case_when(
    all_eq(c(Host.H = Human, Host.C = Consensus, Host.B = Bat, Host.I = Intermediate), na.rm = T ~ "Conserved",
    all_neq(c(Host.H = Human, Host.C = Consensus, Host.B = Bat, Host.I = Intermediate), na.rm = T)) ~ "Unique",
    TRUE ~ "Shared"
  )) %>%
  ungroup()

Unfortunately, it doesn't work for the goal I need. For this reason, if you have a more feasible way to perform this operation, it would be greatly appreciated.

Thanks.

You can use rowSums to count number of non-NA value in a dataframe. Based on that count you can assign Type column.

library(dplyr)

test <- test %>%
  mutate(count = rowSums(!is.na(.[c('Host.H', 'Host.I', 'Host.B', 'Host.C')])), 
         Type = case_when(count == 4 ~ 'Conserved', 
                          count > 1 ~ 'Shared', 
                          count == 1 ~ 'Unique'))

You may drop the count column from the output by including %>% select(-count) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM