繁体   English   中英

根据 R 中数据框中所有其他列中的字符串值,使用 dplyr 创建一个新列

[英]Create a new column using dplyr based on string values in all other columns in a data frame in R

我有一个数据框my_df

my_df <- structure(list(C1 = c("A", "X", "X", "A", "A"), F2 = c("A", "A", 
"A", "A", "A"), T3 = c("A", "A", "X", "X", "A"), S4 = c("A", 
"A", "A", "A", "X"), B5 = c("A", "A", "A", "A", "A")), class = "data.frame", row.names = c("ID1", 
"ID2", "ID3", "ID4", "ID5"))

> my_df
    C1 F2 T3 S4 B5
ID1  A  A  A  A  A
ID2  X  A  A  A  A
ID3  X  A  X  A  A
ID4  A  A  X  A  A
ID5  A  A  A  X  A

我想创建一个新列new_col ,如果所有其他列中的所有值都相同,则表示“相同”,否则表示“差异”。 即,生成的数据框将如下所示:

> my_df
    C1 F2 T3 S4 B5 new_col
ID1  A  A  A  A  A    same
ID2  X  A  A  A  A    diff
ID3  X  A  X  A  A    diff
ID4  A  A  X  A  A    diff
ID5  A  A  A  X  A    diff

使用 dplyr 实现这一目标的最佳方法是什么?

library(tidyverse)
my_df <- structure(list(C1 = c("A", "X", "X", "A", "A"),
                        F2 = c("A", "A", "A", "A", "A"),
                        T3 = c("A", "A", "X", "X", "A"),
                        S4 = c("A", "A", "A", "A", "X"),
                        B5 = c("A", "A", "A", "A", "A")),
                   class = "data.frame",
                   row.names = c("ID1","ID2", "ID3", "ID4", "ID5"))
my_df %>% 
  rowwise() %>% 
  mutate(new_col = if_else(
    length(unique(c_across())) == 1,
    "same",
    "diff"
  ))
#> # A tibble: 5 × 6
#> # Rowwise: 
#>   C1    F2    T3    S4    B5    new_col
#>   <chr> <chr> <chr> <chr> <chr> <chr>  
#> 1 A     A     A     A     A     same   
#> 2 X     A     A     A     A     diff   
#> 3 X     A     X     A     A     diff   
#> 4 A     A     X     A     A     diff   
#> 5 A     A     A     X     A     diff

有几种方法可以做到这一点。 一种是检查每个值是否等于第一个值:

#base R
my_df$new_col <- ifelse(rowSums(my_df == my_df[, 1]) == ncol(my_df), "same", "diff")
my_df$new_col <- ifelse(sapply(my_df, identical, my_df[, 1]), "same", "diff")

#dplyr
my_df %>% 
  dplyr::mutate(new_col = ifelse(rowSums(. == .[, 1]) == ncol(.), "same", "diff"))

    C1 F2 T3 S4 B5 new_col
ID1  A  A  A  A  A    same
ID2  X  A  A  A  A    diff
ID3  X  A  X  A  A    diff
ID4  A  A  X  A  A    diff
ID5  A  A  A  X  A    diff

您还可以检查每行唯一值的长度是否为 1:

apply(my_df, 1, function(x) length(unique(x)) == 1)
#apply(my_df, 1, function(x) dplyr::n_distinct(x) == 1)

使用uniqueN data.table选项:

library(data.table)
setDT(my_df)[, new_col := c("diff", "same")[(uniqueN(unlist(.SD)) == 1) + 1], 1:nrow(my_df)]
my_df

输出:

   C1 F2 T3 S4 B5 new_col
1:  A  A  A  A  A    same
2:  X  A  A  A  A    diff
3:  X  A  X  A  A    diff
4:  A  A  X  A  A    diff
5:  A  A  A  X  A    diff

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM