[英]Filtering dataframe by grouped replicate in R
I have the following data frame of an experiment with two replicates.我有以下两个重复实验的数据框。 I want to filter
df
based on score == 0
in both replicates for each timestamp & ID.我想根据每个时间戳和 ID 的两个重复中的
score == 0
过滤df
。
df <- data.frame(timestamp = c(1, 1, 1, 1, 2, 2, 2, 2),
ID = c(57, 57, 55, 55, 57, 57, 55, 55),
replicate= c(1, 2, 1, 2, 1, 2, 1, 2),
score = c(0, 1, 0, 0, 0, 1, 0, 0))
Eg the desired output would be:例如,所需的输出将是:
target <- data.frame(timestamp = c(1, 1, 2, 2),
ID = c(55, 55, 55, 55),
replicate = c(1, 2, 1, 2),
score = c(0, 0, 0, 0))
I've come up with a solution in a double-loop, which is inelegant and most likely inefficient:我想出了一个双循环的解决方案,这很不优雅,而且很可能效率低下:
tsvec <- df$timestamp %>% unique
idvec <- df$ID %>% unique
df_out <- c()
for(i in seq_along(tsvec)){ # loop along timestamps
innerdat <- df %>% filter(timestamp == tsvec[i])
for(j in seq_along(idvec)){ # loop along IDs
innerdat2 <- innerdat %>% filter(ID == idvec[j])
if(sum(innerdat2$score) == 0){
df_out <- rbind(df_out, innerdat2)
} else {
NULL
}
}
}
Does anybody have a dplyr
way of making this more efficient?有没有人有一种
dplyr
方法可以提高效率?
library(dplyr)
df %>% group_by(ID) %>% filter(all(score==0))
# A tibble: 4 x 4
# Groups: ID [1]
timestamp ID replicate score
<dbl> <dbl> <dbl> <dbl>
1 1 55 1 0
2 1 55 2 0
3 2 55 1 0
4 2 55 2 0
An approach with data.table
使用
data.table
的方法
library(data.table)
setDT(df)[, .SD[all(score == 0)], by = ID]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.