删除R中一组中所有值为零的行

Question

I have data set which is similar to the one below: 我有与下面类似的数据集：

d <- data.frame(A=c(11,11,11,11,21,21,111,111,111,44,44,44),
                B=c(0,1,0,0,0,0,1,0,0,0,0,0),
                C=c(3,2,1,3,4,2,1,2,3,12,22,31))
d
      A B  C
 1   11 0  3
 2   11 1  2
 3   11 0  1
 4   11 0  3
 5   21 0  4
 6   21 0  2
 7  111 1  1
 8  111 0  2
 9  111 0  3
10   44 0 12
11   44 0 22
12   44 0 31

I want to remove rows where B=0 for each row within unique A. For example, when A=11, there is B=1 (the 2nd row), so it is ok. 我要删除唯一A内每行B = 0的行。例如，当A = 11时，有B = 1（第二行），所以可以。 By contrast, for A=21 all B's equal zero, so I want to remove all rows with A=21. 相比之下，对于A = 21，所有B都等于零，因此我想删除A = 21的所有行。 For A=44 again all B's are zero, so I want to remove all rows where A=44. 再次对于A = 44，所有B均为零，因此我想删除A = 44的所有行。

Finally, I need to get this data frame: 最后，我需要获取以下数据框：

PS Don't care about column C, I've added it just to show that there are more then 2 columns in data set. PS不在乎C列，我添加它只是为了表明数据集中有2列以上。

Answer 1

You can use ave and logical subsetting like this: 您可以使用ave和逻辑子集，如下所示：

d[!!ave(d$B, d$A, FUN=function(i) !all(i == 0)),]
    A B C
1  11 0 3
2  11 1 2
3  11 0 1
4  11 0 3
7 111 1 1
8 111 0 2
9 111 0 3

Here, !all(i == 0) returns TRUE when the vector contains a non-zero element. 在此，当向量包含非零元素时， !all(i == 0)返回TRUE。 ave performs this check on each group and returns a vector the same size as the initial vector, !! ave对每个组执行此检查，并返回与初始向量!!大小相同的向量!! converts it into a logical vector. 将其转换为逻辑向量。 This conversion is necessary because ave will return a vector of the same type as the initial vector. 此转换是必需的，因为ave将返回与初始向量相同类型的向量。 More explicitly than !! 比!!更明确!! would be as.logical . 将是as.logical 。

d[as.logical(ave(d$B, d$A, FUN=function(i) !all(i == 0))),]

Answer 2

Or use a simple dplyr operation: (btw I belive your expected output is off) 或使用简单的dplyr操作：（顺便说一句，我相信您的预期输出已关闭）

require(dpylr)
d %>% group_by(A) %>% filter(sum(B) >= 1)

Answer 3

How about a base R solution: base R解决方案如何：

d[d$A %in% d$A[d$B!=0], ]

It's also pretty fast: 它也非常快：

library(microbenchmark)
library(dplyr)

set.seed(33)  ## making a larger example
A <- do.call(c, lapply(sample(10000, 2000), function(x) rep(x, sample(100, 1))))
B <- sample(c(0,1), length(A), replace = TRUE, prob = c(18/19, 1/19))
C <- sample(10^5, length(A), replace = TRUE)
df <- data.frame(A, B, C)

superBase <- function(d) {d[d$A %in% d$A[d$B!=0], ]}
  aveStat <- function(d) {d[!!ave(d$B, d$A, FUN=function(i) !all(i == 0)),]}
 dplyrSol <- function(d) {d %>% group_by(A) %>% filter(sum(B) >= 1)}

microbenchmark(superBase(df), aveStat(df), dplyrSol(df))
Unit: milliseconds
         expr      min       lq     mean   median       uq      max neval cld
superBase(df) 21.44030 23.81434 30.00466 26.67157 27.32492 167.1614   100 a  
  aveStat(df) 34.23338 39.03278 49.12483 40.29534 42.96865 204.0808   100  b 
 dplyrSol(df) 63.52571 65.32626 71.64950 67.20563 69.43784 215.5980   100   c

Gives the same results: 给出相同的结果：

identical(superBase(df), aveStat(df))
[1] TRUE

删除R中一组中所有值为零的行

问题描述

3 个解决方案

解决方案1
2 2017-03-24 20:26:30

解决方案2
2 2017-03-24 20:40:55

解决方案3
2 已采纳 2017-03-24 20:55:41

删除R中一组中所有值为零的行

问题描述

3 个解决方案

解决方案1 2 2017-03-24 20:26:30

解决方案2 2 2017-03-24 20:40:55

解决方案3 2 已采纳 2017-03-24 20:55:41

解决方案1
2 2017-03-24 20:26:30

解决方案2
2 2017-03-24 20:40:55

解决方案3
2 已采纳 2017-03-24 20:55:41