简体   繁体   English

基于满足R中的多列条件删除行

[英]Deleting rows based upon meeting multiple column conditions in R

I have a very large dataset that I would like to clean up by deleting rows where I have select columns meet the condition of all entries in these column selection being equal to 0. Here is currently what I have:我有一个非常大的数据集,我想通过删除行来清理我选择的列满足这些列选择中的所有条目等于 0 的条件。这是我目前拥有的:

df1 <- filter(df,((n)==0 & (n+1)==0 & (n+2)==0 & (n+3)==0 & ......(n+100)==0)

How do I do this so that I delete all row entries that meet this condition for every nth column?如何执行此操作,以便为每第 n 列删除满足此条件的所有行条目?

Also, if I wanted to iterate this condition, do I need to state the name of the column?另外,如果我想迭代这个条件,是否需要说明列的名称?

Here is an example dataset:这是一个示例数据集:

 A tibble: 10 x 10
 A B C D E F G H I J
 1 1 1 1 1 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0
 0 0 1 1 1 0 0 0 1 1
 0 0 0 0 0 0 0 0 0 0
 1 1 1 1 1 1 1 0 1 1
 1 1 1 1 1 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 1 0 0
 0 0 0 0 1 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0

I want to remove all rows where F, G & H column are equal to 0, where my result will be:我想删除 F、G 和 H 列等于 0 的所有行,我的结果将是:

 A tibble: 10 x 2
 A B C D E F G H I J
 1 1 1 1 1 1 1 0 1 1
 0 0 0 0 0 0 0 1 0 0

One option is filter_at一种选择是filter_at

library(dplyr)
df %>%
   filter_at(11:20, any_vars( .  != 0))

A reproducible example一个可复制的例子

df1 %>% 
   filter_at(vars(`11`:`13`), any_vars(. != 0))
# A tibble: 2 x 4
#   `11`  `12`  `13` grp  
#     <dbl> <dbl> <dbl> <chr>
#1     1     0     4 a    
#2     0     1     0 b    

Or using across from the devel version of dplyr或者across dplyrdevel版本dplyr

df1 %>%
    filter(across(cols = matches('^\\d+$'), ~ (.x == 0))) %>% 
    anti_join(df1, .)
# A tibble: 2 x 4
#   `11`  `12`  `13` grp  
#  <dbl> <dbl> <dbl> <chr>
#1     1     0     4 a    
#2     0     1     0 b    

Update更新

Based on the OP's update, if we have 'n' as some column index and want to filter basedd the columns from that position to the 100 columns after that根据 OP 的更新,如果我们将 'n' 作为某个列索引,并且想要基于从该位置到 100 列之后的列进行过滤

n <- 5
df %>%
     filter_at(n:(n+100), any_vars(. != 0))

Update2更新2

df2 %>%
   filter_at(vars(F, G, H), any_vars(. != 0))
# A tibble: 2 x 10
#      A     B     C     D     E     F     G     H     I     J
#  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1     1     1     1     1     1     1     1     0     1     1
#2     0     0     0     0     0     0     0     1     0     0

Or using base R或使用base R

df2[rowSums(df2[c("F", "G", "H")] != 0) > 0,]

data数据

df1 <- tibble(`11` = c(1, 0, 0), `12` = c(0, 1, 0), `13` = c(4,  0, 0), 
  grp = letters[1:3])





df2 <- structure(list(A = c(1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), 
    B = c(1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), C = c(1L, 
    0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), D = c(1L, 0L, 1L, 0L, 
    1L, 1L, 0L, 0L, 0L, 0L), E = c(1L, 0L, 1L, 0L, 1L, 1L, 0L, 
    0L, 1L, 0L), F = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L
    ), G = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), H = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), I = c(0L, 0L, 1L, 0L, 
    1L, 0L, 0L, 0L, 0L, 0L), J = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 
    0L, 0L, 0L)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM