繁体   English   中英

根据 R 中的多列条件删除行

[英]Dropping rows based on multiple column conditions in R

我想根据跨多个列的不同列条件对行进行子集化。 例如,在附图中,我希望我的脚本删除所有满足以下条件的行:

在任何一列(一到五)中,删除在五列中的任何一列中都没有有效条目的行(有效条目是:差、好、非常好、优秀)。 本质上,删除具有无效条目的行(无效条目是:“NULL”、“''”或包含“@”)

在此示例中,只有 Chris 将被排除,而其他人将被保留,因为他们至少包含 5 列中的一个有效条目。 在此处输入图像描述

数据:

df <-
  tibble(
    Name = c("John", "Peter", "Paul", "Joy", "Mike", "Vinc", "Ben", "Chris"),
    One = c("NULL", "@gmail", "NULL", "good", "''", "very_good", "excellent", "NULL"),
    Two = c("@yahoo", "''", "good", "good", "good", "excellent", "NULL", "''"),
    Three = c("''", "good", "very good", "poor", "excellent", "NULL", "NULL", "@gmai"),
    Four = c("good", "good", "good", "NULL", "good", "good", "good", "NULL"),
    Five = c("@gmail", "very good", "excellent", "poor", "NULL", "NULL", "NULL", "NULL")
  )

你可以在这里使用dplyr::if_any -

library(dplyr)

valid_entry <- c("poor", "good", "very_good", "excellent")

df %>% filter(if_any(One:Five, ~.x %in% valid_entry))

编辑

过滤从OneFive任何列包含“无效”值的行:

library(dplyr)
library(stringr)
df %>% 
  filter(if_any(One:Five, 
                ~!str_detect(., paste0(c("poor", "good", "very_good", "excellent"), collapse = "|"))))
# A tibble: 6 × 6
  Name  One       Two    Three     Four  Five     
  <chr> <chr>     <chr>  <chr>     <chr> <chr>    
1 John  NULL      @yahoo ''        good  @gmail   
2 Peter @gmail    ''     good      good  very good
3 Paul  NULL      good   very good good  excellent
4 Mike  ''        good   excellent good  NULL     
5 Ben   excellent NULL   NULL      good  NULL     
6 Chris NULL      ''     @gmai     NULL  NULL

过滤从OneFive所有列都包含“无效”值的行:

library(dplyr)
library(stringr)
df %>% 
  filter(if_all(One:Five, 
                ~!str_detect(., paste0(c("poor", "good", "very_good", "excellent"), collapse = "|"))))
# A tibble: 1 × 6
  Name  One   Two   Three Four  Five 
  <chr> <chr> <chr> <chr> <chr> <chr>
1 Chris NULL  ''    @gmai NULL  NULL 

数据:

df <-
  tibble(
    Name = c("John", "Peter", "Paul", "Joy", "Mike", "Vinc", "Ben", "Chris"),
    One = c("NULL", "@gmail", "NULL", "good", "''", "very_good", "excellent", "NULL"),
    Two = c("@yahoo", "''", "good", "good", "good", "excellent", "NULL", "''"),
    Three = c("''", "good", "very good", "poor", "excellent", "NULL", "NULL", "@gmai"),
    Four = c("good", "good", "good", "NULL", "good", "good", "good", "NULL"),
    Five = c("@gmail", "very good", "excellent", "poor", "NULL", "NULL", "NULL", "NULL")
  )

你可以试试这个:

# sample data
df <- data.frame(
  Name = c("John", "Peter", "Chris"),
  One  = c("NULL", "@gmail", "NULL"),
  Two = c("@yahoo", "", "@gmai"),
  Three = c("very good", "good", "NULL")
)

Function 检查一行是否有效:

isInvalid <- function(row) {
  row <- row[-1]                  # ignore Name
  ats <- length(grep("@", row))   # count the number of cells with "@"
  invalids <- c("NULL", "")       # list of error values
  invs <- length(which(row %in% invalids))  # count nr of error values
  (ats + invs) == length(row)               # if 'nr of ats' + 'nr of error values' is equal to the nr of cells -> invalid row
}

为 dataframe 中的每一行调用 function:

invalidrows <- apply(df, MARGIN = 1, FUN = isInvalid)
# results in FALSE FALSE TRUE

从原始 dataframe 中提取无效行:

invalid <- df[invalidrows,]
# returns the 'Chris' row

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM