简体   繁体   中英

Create a new column using dplyr based on string values in all other columns in a data frame in R

I have a data frame, my_df :

my_df <- structure(list(C1 = c("A", "X", "X", "A", "A"), F2 = c("A", "A", 
"A", "A", "A"), T3 = c("A", "A", "X", "X", "A"), S4 = c("A", 
"A", "A", "A", "X"), B5 = c("A", "A", "A", "A", "A")), class = "data.frame", row.names = c("ID1", 
"ID2", "ID3", "ID4", "ID5"))

> my_df
    C1 F2 T3 S4 B5
ID1  A  A  A  A  A
ID2  X  A  A  A  A
ID3  X  A  X  A  A
ID4  A  A  X  A  A
ID5  A  A  A  X  A

I want to create a new column, new_col , that says "same" if all values in all other columns are identical, otherwise it says "diff". Ie, the resulting data frame would look like:

> my_df
    C1 F2 T3 S4 B5 new_col
ID1  A  A  A  A  A    same
ID2  X  A  A  A  A    diff
ID3  X  A  X  A  A    diff
ID4  A  A  X  A  A    diff
ID5  A  A  A  X  A    diff

What is the best way to achieve this using dplyr?

library(tidyverse)
my_df <- structure(list(C1 = c("A", "X", "X", "A", "A"),
                        F2 = c("A", "A", "A", "A", "A"),
                        T3 = c("A", "A", "X", "X", "A"),
                        S4 = c("A", "A", "A", "A", "X"),
                        B5 = c("A", "A", "A", "A", "A")),
                   class = "data.frame",
                   row.names = c("ID1","ID2", "ID3", "ID4", "ID5"))
my_df %>% 
  rowwise() %>% 
  mutate(new_col = if_else(
    length(unique(c_across())) == 1,
    "same",
    "diff"
  ))
#> # A tibble: 5 × 6
#> # Rowwise: 
#>   C1    F2    T3    S4    B5    new_col
#>   <chr> <chr> <chr> <chr> <chr> <chr>  
#> 1 A     A     A     A     A     same   
#> 2 X     A     A     A     A     diff   
#> 3 X     A     X     A     A     diff   
#> 4 A     A     X     A     A     diff   
#> 5 A     A     A     X     A     diff

There are several ways to do this. One is to check if each value equals the first one:

#base R
my_df$new_col <- ifelse(rowSums(my_df == my_df[, 1]) == ncol(my_df), "same", "diff")
my_df$new_col <- ifelse(sapply(my_df, identical, my_df[, 1]), "same", "diff")

#dplyr
my_df %>% 
  dplyr::mutate(new_col = ifelse(rowSums(. == .[, 1]) == ncol(.), "same", "diff"))

    C1 F2 T3 S4 B5 new_col
ID1  A  A  A  A  A    same
ID2  X  A  A  A  A    diff
ID3  X  A  X  A  A    diff
ID4  A  A  X  A  A    diff
ID5  A  A  A  X  A    diff

You can also check if the length of unique values per row is 1:

apply(my_df, 1, function(x) length(unique(x)) == 1)
#apply(my_df, 1, function(x) dplyr::n_distinct(x) == 1)

data.table option using uniqueN :

library(data.table)
setDT(my_df)[, new_col := c("diff", "same")[(uniqueN(unlist(.SD)) == 1) + 1], 1:nrow(my_df)]
my_df

Output:

   C1 F2 T3 S4 B5 new_col
1:  A  A  A  A  A    same
2:  X  A  A  A  A    diff
3:  X  A  X  A  A    diff
4:  A  A  X  A  A    diff
5:  A  A  A  X  A    diff

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM