[英]Dropping rows based on multiple column conditions in R
我想根据跨多个列的不同列条件对行进行子集化。 例如,在附图中,我希望我的脚本删除所有满足以下条件的行:
在任何一列(一到五)中,删除在五列中的任何一列中都没有有效条目的行(有效条目是:差、好、非常好、优秀)。 本质上,删除具有无效条目的行(无效条目是:“NULL”、“''”或包含“@”)
在此示例中,只有 Chris 将被排除,而其他人将被保留,因为他们至少包含 5 列中的一个有效条目。
数据:
df <-
tibble(
Name = c("John", "Peter", "Paul", "Joy", "Mike", "Vinc", "Ben", "Chris"),
One = c("NULL", "@gmail", "NULL", "good", "''", "very_good", "excellent", "NULL"),
Two = c("@yahoo", "''", "good", "good", "good", "excellent", "NULL", "''"),
Three = c("''", "good", "very good", "poor", "excellent", "NULL", "NULL", "@gmai"),
Four = c("good", "good", "good", "NULL", "good", "good", "good", "NULL"),
Five = c("@gmail", "very good", "excellent", "poor", "NULL", "NULL", "NULL", "NULL")
)
你可以在这里使用dplyr::if_any
-
library(dplyr)
valid_entry <- c("poor", "good", "very_good", "excellent")
df %>% filter(if_any(One:Five, ~.x %in% valid_entry))
编辑:
要过滤从One
到Five
的任何列包含“无效”值的行:
library(dplyr)
library(stringr)
df %>%
filter(if_any(One:Five,
~!str_detect(., paste0(c("poor", "good", "very_good", "excellent"), collapse = "|"))))
# A tibble: 6 × 6
Name One Two Three Four Five
<chr> <chr> <chr> <chr> <chr> <chr>
1 John NULL @yahoo '' good @gmail
2 Peter @gmail '' good good very good
3 Paul NULL good very good good excellent
4 Mike '' good excellent good NULL
5 Ben excellent NULL NULL good NULL
6 Chris NULL '' @gmai NULL NULL
要过滤从One
到Five
的所有列都包含“无效”值的行:
library(dplyr)
library(stringr)
df %>%
filter(if_all(One:Five,
~!str_detect(., paste0(c("poor", "good", "very_good", "excellent"), collapse = "|"))))
# A tibble: 1 × 6
Name One Two Three Four Five
<chr> <chr> <chr> <chr> <chr> <chr>
1 Chris NULL '' @gmai NULL NULL
数据:
df <-
tibble(
Name = c("John", "Peter", "Paul", "Joy", "Mike", "Vinc", "Ben", "Chris"),
One = c("NULL", "@gmail", "NULL", "good", "''", "very_good", "excellent", "NULL"),
Two = c("@yahoo", "''", "good", "good", "good", "excellent", "NULL", "''"),
Three = c("''", "good", "very good", "poor", "excellent", "NULL", "NULL", "@gmai"),
Four = c("good", "good", "good", "NULL", "good", "good", "good", "NULL"),
Five = c("@gmail", "very good", "excellent", "poor", "NULL", "NULL", "NULL", "NULL")
)
你可以试试这个:
# sample data
df <- data.frame(
Name = c("John", "Peter", "Chris"),
One = c("NULL", "@gmail", "NULL"),
Two = c("@yahoo", "", "@gmai"),
Three = c("very good", "good", "NULL")
)
Function 检查一行是否有效:
isInvalid <- function(row) {
row <- row[-1] # ignore Name
ats <- length(grep("@", row)) # count the number of cells with "@"
invalids <- c("NULL", "") # list of error values
invs <- length(which(row %in% invalids)) # count nr of error values
(ats + invs) == length(row) # if 'nr of ats' + 'nr of error values' is equal to the nr of cells -> invalid row
}
为 dataframe 中的每一行调用 function:
invalidrows <- apply(df, MARGIN = 1, FUN = isInvalid)
# results in FALSE FALSE TRUE
从原始 dataframe 中提取无效行:
invalid <- df[invalidrows,]
# returns the 'Chris' row
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.