通过 R 中的分组复制过滤数据帧

Question

I have the following data frame of an experiment with two replicates.我有以下两个重复实验的数据框。 I want to filter df based on score == 0 in both replicates for each timestamp & ID.我想根据每个时间戳和 ID 的两个重复中的score == 0过滤df 。

df <- data.frame(timestamp = c(1, 1, 1, 1, 2, 2, 2, 2),
             ID = c(57, 57, 55, 55, 57, 57, 55, 55),
             replicate= c(1, 2, 1, 2, 1, 2, 1, 2),
             score = c(0, 1, 0, 0, 0, 1, 0, 0))

Eg the desired output would be:例如，所需的输出将是：

target <- data.frame(timestamp = c(1, 1, 2, 2), 
                 ID = c(55, 55, 55, 55), 
                 replicate = c(1, 2, 1, 2),
                 score = c(0, 0, 0, 0))

I've come up with a solution in a double-loop, which is inelegant and most likely inefficient:我想出了一个双循环的解决方案，这很不优雅，而且很可能效率低下：

tsvec <- df$timestamp %>% unique
idvec <- df$ID %>% unique
df_out <- c()

for(i in seq_along(tsvec)){ # loop along timestamps
  innerdat <- df %>% filter(timestamp == tsvec[i])
  for(j in seq_along(idvec)){ # loop along IDs
    innerdat2 <- innerdat %>% filter(ID == idvec[j])
    if(sum(innerdat2$score) == 0){
        df_out <- rbind(df_out, innerdat2)
    } else {
        NULL
    }
  }
}

Does anybody have a dplyr way of making this more efficient?有没有人有一种dplyr方法可以提高效率？

Answer 1

library(dplyr)
df %>% group_by(ID) %>% filter(all(score==0))

# A tibble: 4 x 4
# Groups:   ID [1]
  timestamp    ID replicate score
      <dbl> <dbl>     <dbl> <dbl>
1         1    55         1     0
2         1    55         2     0
3         2    55         1     0
4         2    55         2     0

Answer 2

An approach with data.table使用data.table的方法

library(data.table)
setDT(df)[, .SD[all(score == 0)], by = ID]

通过 R 中的分组复制过滤数据帧

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-12-13 13:01:51

解决方案2
2 2019-12-13 14:38:10

通过 R 中的分组复制过滤数据帧

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-12-13 13:01:51

解决方案2 2 2019-12-13 14:38:10

解决方案1
3 已采纳 2019-12-13 13:01:51

解决方案2
2 2019-12-13 14:38:10