根据 R 中的多列条件删除行

Question

我想根据跨多个列的不同列条件对行进行子集化。 例如，在附图中，我希望我的脚本删除所有满足以下条件的行：

在任何一列（一到五）中，删除在五列中的任何一列中都没有有效条目的行（有效条目是：差、好、非常好、优秀）。 本质上，删除具有无效条目的行（无效条目是：“NULL”、“''”或包含“@”）

在此示例中，只有 Chris 将被排除，而其他人将被保留，因为他们至少包含 5 列中的一个有效条目。

数据：

df <-
  tibble(
    Name = c("John", "Peter", "Paul", "Joy", "Mike", "Vinc", "Ben", "Chris"),
    One = c("NULL", "@gmail", "NULL", "good", "''", "very_good", "excellent", "NULL"),
    Two = c("@yahoo", "''", "good", "good", "good", "excellent", "NULL", "''"),
    Three = c("''", "good", "very good", "poor", "excellent", "NULL", "NULL", "@gmai"),
    Four = c("good", "good", "good", "NULL", "good", "good", "good", "NULL"),
    Five = c("@gmail", "very good", "excellent", "poor", "NULL", "NULL", "NULL", "NULL")
  )

Answer 1

你可以在这里使用dplyr::if_any -

library(dplyr)

valid_entry <- c("poor", "good", "very_good", "excellent")

df %>% filter(if_any(One:Five, ~.x %in% valid_entry))

Answer 2

编辑：

要过滤从One到Five的任何列包含“无效”值的行：

library(dplyr)
library(stringr)
df %>% 
  filter(if_any(One:Five, 
                ~!str_detect(., paste0(c("poor", "good", "very_good", "excellent"), collapse = "|"))))
# A tibble: 6 × 6
  Name  One       Two    Three     Four  Five     
  <chr> <chr>     <chr>  <chr>     <chr> <chr>    
1 John  NULL      @yahoo ''        good  @gmail   
2 Peter @gmail    ''     good      good  very good
3 Paul  NULL      good   very good good  excellent
4 Mike  ''        good   excellent good  NULL     
5 Ben   excellent NULL   NULL      good  NULL     
6 Chris NULL      ''     @gmai     NULL  NULL

要过滤从One到Five的所有列都包含“无效”值的行：

library(dplyr)
library(stringr)
df %>% 
  filter(if_all(One:Five, 
                ~!str_detect(., paste0(c("poor", "good", "very_good", "excellent"), collapse = "|"))))
# A tibble: 1 × 6
  Name  One   Two   Three Four  Five 
  <chr> <chr> <chr> <chr> <chr> <chr>
1 Chris NULL  ''    @gmai NULL  NULL

数据：

df <-
  tibble(
    Name = c("John", "Peter", "Paul", "Joy", "Mike", "Vinc", "Ben", "Chris"),
    One = c("NULL", "@gmail", "NULL", "good", "''", "very_good", "excellent", "NULL"),
    Two = c("@yahoo", "''", "good", "good", "good", "excellent", "NULL", "''"),
    Three = c("''", "good", "very good", "poor", "excellent", "NULL", "NULL", "@gmai"),
    Four = c("good", "good", "good", "NULL", "good", "good", "good", "NULL"),
    Five = c("@gmail", "very good", "excellent", "poor", "NULL", "NULL", "NULL", "NULL")
  )

Answer 3

你可以试试这个：

# sample data
df <- data.frame(
  Name = c("John", "Peter", "Chris"),
  One  = c("NULL", "@gmail", "NULL"),
  Two = c("@yahoo", "", "@gmai"),
  Three = c("very good", "good", "NULL")
)

Function 检查一行是否有效：

isInvalid <- function(row) {
  row <- row[-1]                  # ignore Name
  ats <- length(grep("@", row))   # count the number of cells with "@"
  invalids <- c("NULL", "")       # list of error values
  invs <- length(which(row %in% invalids))  # count nr of error values
  (ats + invs) == length(row)               # if 'nr of ats' + 'nr of error values' is equal to the nr of cells -> invalid row
}

为 dataframe 中的每一行调用 function：

invalidrows <- apply(df, MARGIN = 1, FUN = isInvalid)
# results in FALSE FALSE TRUE

从原始 dataframe 中提取无效行：

invalid <- df[invalidrows,]
# returns the 'Chris' row

根据 R 中的多列条件删除行

问题描述

3 个解决方案

解决方案1
2 2022-02-15 12:10:11

解决方案2
1 2022-02-15 12:16:36

解决方案3
0 2022-02-15 13:56:29

根据 R 中的多列条件删除行

问题描述

3 个解决方案

解决方案1 2 2022-02-15 12:10:11

解决方案2 1 2022-02-15 12:16:36

解决方案3 0 2022-02-15 13:56:29

解决方案1
2 2022-02-15 12:10:11

解决方案2
1 2022-02-15 12:16:36

解决方案3
0 2022-02-15 13:56:29