简体   繁体   English

删除R中一组中所有值为零的行

[英]Removing rows that have all zero values within one group in R

I have data set which is similar to the one below: 我有与下面类似的数据集:

d <- data.frame(A=c(11,11,11,11,21,21,111,111,111,44,44,44),
                B=c(0,1,0,0,0,0,1,0,0,0,0,0),
                C=c(3,2,1,3,4,2,1,2,3,12,22,31))
d
      A B  C
 1   11 0  3
 2   11 1  2
 3   11 0  1
 4   11 0  3
 5   21 0  4
 6   21 0  2
 7  111 1  1
 8  111 0  2
 9  111 0  3
10   44 0 12
11   44 0 22
12   44 0 31

I want to remove rows where B=0 for each row within unique A. For example, when A=11, there is B=1 (the 2nd row), so it is ok. 我要删除唯一A内每行B = 0的行。例如,当A = 11时,有B = 1(第二行),所以可以。 By contrast, for A=21 all B's equal zero, so I want to remove all rows with A=21. 相比之下,对于A = 21,所有B都等于零,因此我想删除A = 21的所有行。 For A=44 again all B's are zero, so I want to remove all rows where A=44. 再次对于A = 44,所有B均为零,因此我想删除A = 44的所有行。

Finally, I need to get this data frame: 最后,我需要获取以下数据框:

new_d
    A B  C
1  11 0  3
2  11 1  2
3  11 0  1
4  11 0  3
5 111 1 12
6 111 0 22
7 111 0 31

PS Don't care about column C, I've added it just to show that there are more then 2 columns in data set. PS不在乎C列,我添加它只是为了表明数据集中有2列以上。

You can use ave and logical subsetting like this: 您可以使用ave和逻辑子集,如下所示:

d[!!ave(d$B, d$A, FUN=function(i) !all(i == 0)),]
    A B C
1  11 0 3
2  11 1 2
3  11 0 1
4  11 0 3
7 111 1 1
8 111 0 2
9 111 0 3

Here, !all(i == 0) returns TRUE when the vector contains a non-zero element. 在此,当向量包含非零元素时, !all(i == 0)返回TRUE。 ave performs this check on each group and returns a vector the same size as the initial vector, !! ave对每个组执行此检查,并返回与初始向量!!大小相同的向量!! converts it into a logical vector. 将其转换为逻辑向量。 This conversion is necessary because ave will return a vector of the same type as the initial vector. 此转换是必需的,因为ave将返回与初始向量相同类型的向量。 More explicitly than !! !!更明确!! would be as.logical . 将是as.logical

d[as.logical(ave(d$B, d$A, FUN=function(i) !all(i == 0))),]

Or use a simple dplyr operation: (btw I belive your expected output is off) 或使用简单的dplyr操作:(顺便说一句,我相信您的预期输出已关闭)

require(dpylr)
d %>% group_by(A) %>% filter(sum(B) >= 1)

How about a base R solution: base R解决方案如何:

d[d$A %in% d$A[d$B!=0], ]

It's also pretty fast: 它也非常快:

library(microbenchmark)
library(dplyr)

set.seed(33)  ## making a larger example
A <- do.call(c, lapply(sample(10000, 2000), function(x) rep(x, sample(100, 1))))
B <- sample(c(0,1), length(A), replace = TRUE, prob = c(18/19, 1/19))
C <- sample(10^5, length(A), replace = TRUE)
df <- data.frame(A, B, C)

superBase <- function(d) {d[d$A %in% d$A[d$B!=0], ]}
  aveStat <- function(d) {d[!!ave(d$B, d$A, FUN=function(i) !all(i == 0)),]}
 dplyrSol <- function(d) {d %>% group_by(A) %>% filter(sum(B) >= 1)}

microbenchmark(superBase(df), aveStat(df), dplyrSol(df))
Unit: milliseconds
         expr      min       lq     mean   median       uq      max neval cld
superBase(df) 21.44030 23.81434 30.00466 26.67157 27.32492 167.1614   100 a  
  aveStat(df) 34.23338 39.03278 49.12483 40.29534 42.96865 204.0808   100  b 
 dplyrSol(df) 63.52571 65.32626 71.64950 67.20563 69.43784 215.5980   100   c

Gives the same results: 给出相同的结果:

identical(superBase(df), aveStat(df))
[1] TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R删除值最多为零的行(唯一并删除所有值为0的行不起作用) - R remove rows with most zero values (unique and removing all rows with 0 not working) 用于删除组内值的 R 代码 - R Code for removing values within a group 筛选出非零值,除非该值是其组中唯一的值(SQL或R) - Filtering out non-zero values unless the value is the only one within its group (SQL or R) 如果 R 的 data.frame 的一列中存在两个指定值,如何保留一组的所有行 - How to keep all rows of one group if two specified values are present in one column in data.frame in R 使用 Dplyr/R 创建包含组内所有行索引的向量的列 - Create column with vector with all rows indices within group with Dplyr/R R中组内所有行之间的数值差异 - Numerical difference between all rows within a group in R 如何在R中的组中选择具有特定值的行 - How to select rows with certain values within a group in R 如何找到每个组中的最大值,然后将该组中的所有其他值重新编码为零? - How to find the maximum value within each group and then recode all other values in the group as zero? 当 R 中的所有特定列均非零时,保留组的所有行 - Keep all rows of a group when all specific columns are non-zero in R 识别并将值从一行复制到一组中的多行 - Identify and copy values from one row to multiple rows within a group
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM