[英]Replace a numerical value by NA based on conditions from other columns:
我是data.table包的新手,请执行我的简单问题。 我有一个看起来像DT的数据集
DT <- data.table(a = sample(c("C","M","Y","K"), 100, rep=TRUE),
b = sample(c("A","S"), 100, rep=TRUE),
f = round(rnorm(n=100, mean=.90, sd=.08),digits = 2) ); DT
如果满足某个条件,我想用NA替换列f中的任何值。 例如,对于0.85 > f > 0.90
我将具有以下条件:
DT$a == "C" & DT$b == "S" & DT$f < .85| DT$a == "C" & DT$b == "S" & DT$f >.90
我还想对a和b列中的每个分类变量有不同的条件。
使用您已声明的条件,但如果没有DT$
将对满足条件的条目的data.table
进行子集化,那么您可以使用j
字段通过引用使用:=
运算符将NA值分配给f
。 那是,
DT[a == "C" & b == "S" & f < .85 | a == "C" & b == "S" & f >.90, f := NA]
which(is.na(DT$f))
# [1] 3 16 31 89
编辑:在OP的评论和@Joshua的好建议之后:
`%between%` <- function(x, vals) { x >= vals[1] & x <= vals[2]}
`%nbetween%` <- Negate(`%between%`)
DT[a %in% c("C", "M", "Y", "K") & b == "S" & f %nbetween% c(0.85, 0.90), f := NA]
%nbetween%
是的否定%between%
将得到所需的结果(F <0.85和f> 0.90)。 还要注意使用%in%
检查的多个值a
编辑2:在OP完全重写之后,我恐怕你无能为力,除了组b ==“A”,b ==“S”。
`%nbetween%` <- Negate(`%between%`)
DT[a == "M" & b %in% c("A", "S") & f %nbetween% c(.85, .90), f := NA]
DT[a == "Y" & b %in% c("A", "S") & f %nbetween% c(.95, .90), f := NA]
DT[a == "K" & b %in% c("A", "S") & f %nbetween% c(.95, 1.10), f := NA]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.