[英]Deleting rows based upon meeting multiple column conditions in R
I have a very large dataset that I would like to clean up by deleting rows where I have select columns meet the condition of all entries in these column selection being equal to 0. Here is currently what I have:我有一个非常大的数据集,我想通过删除行来清理我选择的列满足这些列选择中的所有条目等于 0 的条件。这是我目前拥有的:
df1 <- filter(df,((n)==0 & (n+1)==0 & (n+2)==0 & (n+3)==0 & ......(n+100)==0)
How do I do this so that I delete all row entries that meet this condition for every nth column?如何执行此操作,以便为每第 n 列删除满足此条件的所有行条目?
Also, if I wanted to iterate this condition, do I need to state the name of the column?另外,如果我想迭代这个条件,是否需要说明列的名称?
Here is an example dataset:这是一个示例数据集:
A tibble: 10 x 10
A B C D E F G H I J
1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 1 1
1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
I want to remove all rows where F, G & H column are equal to 0, where my result will be:我想删除 F、G 和 H 列等于 0 的所有行,我的结果将是:
A tibble: 10 x 2
A B C D E F G H I J
1 1 1 1 1 1 1 0 1 1
0 0 0 0 0 0 0 1 0 0
One option is filter_at
一种选择是filter_at
library(dplyr)
df %>%
filter_at(11:20, any_vars( . != 0))
A reproducible example一个可复制的例子
df1 %>%
filter_at(vars(`11`:`13`), any_vars(. != 0))
# A tibble: 2 x 4
# `11` `12` `13` grp
# <dbl> <dbl> <dbl> <chr>
#1 1 0 4 a
#2 0 1 0 b
Or using across
from the devel
version of dplyr
或者across
dplyr
的devel
版本dplyr
df1 %>%
filter(across(cols = matches('^\\d+$'), ~ (.x == 0))) %>%
anti_join(df1, .)
# A tibble: 2 x 4
# `11` `12` `13` grp
# <dbl> <dbl> <dbl> <chr>
#1 1 0 4 a
#2 0 1 0 b
Based on the OP's update, if we have 'n' as some column index and want to filter basedd the columns from that position to the 100 columns after that根据 OP 的更新,如果我们将 'n' 作为某个列索引,并且想要基于从该位置到 100 列之后的列进行过滤
n <- 5
df %>%
filter_at(n:(n+100), any_vars(. != 0))
df2 %>%
filter_at(vars(F, G, H), any_vars(. != 0))
# A tibble: 2 x 10
# A B C D E F G H I J
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 1 1 1 1 1 1 0 1 1
#2 0 0 0 0 0 0 0 1 0 0
Or using base R
或使用base R
df2[rowSums(df2[c("F", "G", "H")] != 0) > 0,]
df1 <- tibble(`11` = c(1, 0, 0), `12` = c(0, 1, 0), `13` = c(4, 0, 0),
grp = letters[1:3])
df2 <- structure(list(A = c(1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L),
B = c(1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), C = c(1L,
0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), D = c(1L, 0L, 1L, 0L,
1L, 1L, 0L, 0L, 0L, 0L), E = c(1L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 1L, 0L), F = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L
), G = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), H = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), I = c(0L, 0L, 1L, 0L,
1L, 0L, 0L, 0L, 0L, 0L), J = c(0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 0L)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.