[英]Remove rows from data.table in R based on values of several columns
I have a data.table in R which has several ids and a value. 我在R中有一个data.table,它有几个id和一个值。 For each combination of ids, there are several rows.
对于每个ID组合,有几行。 If one of these rows contains NA in the column 'value', I would like to remove all rows with this combination of ids.
如果这些行中的一行在“值”列中包含NA,我想删除具有此组合ID的所有行。 For example, in the table below, I would like to remove all rows for which
id1 == 2
and id2 == 1
. 例如,在下表中,我想删除
id1 == 2
和id2 == 1
所有行。
If I had only one id I would do dat[!(id1 %in% dat[is.na(value),id1])]
. 如果我只有一个id,我会做
dat[!(id1 %in% dat[is.na(value),id1])]
。 In the example, this would remove all rows where i1 == 2. However, I did not manage to include several columns. 在示例中,这将删除i1 == 2的所有行。但是,我没有设置包含多个列。
dat <- data.table(id1 = c(1,1,2,2,2,2),
id2 = c(1,2,1,2,3,1),
value = c(5,3,NA,6,7,3))
If you want to check per combination of id1
and id2
if any of the values are NA
s and then remove that whole combination, you can insert an if
statement per group and only retrieve the results (using .SD
) if that statement returns TRUE
. 如果要检查每个
id1
和id2
组合,如果任何值为NA
,然后删除整个组合,则可以为每个组插入一个if
语句,并仅在该语句返回TRUE
检索结果(使用.SD
)。
dat[, if(!anyNA(value)) .SD, by = .(id1, id2)]
# id1 id2 value
# 1: 1 1 5
# 2: 1 2 3
# 3: 2 2 6
# 4: 2 3 7
Or similarly, 或者类似地,
dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.