[英]data.table: calculation condition on presence of value within group
With some data有一些数据
library(data.table); set.seed(42)
dat <- data.table(id=1:5, group=c(1,1,1,2,2), time=c(1,2,3,1,2), val=runif(5))
> dat
id group time val
1: 1 1 1 0.9148060
2: 2 1 2 0.9370754
3: 3 1 3 0.2861395
4: 4 2 1 0.8304476
5: 5 2 2 0.6417455
and I would like to apply some calculation, say val*2
, to time point 2 only to those groups for which there is no third time point.我想应用一些计算,比如val*2
,将时间点 2 应用到那些没有第三时间点的组。 The expected output is therefore因此,预期的输出是
> res
id group time val
1: 1 1 1 0.9148060
2: 2 1 2 0.9370754
3: 3 1 3 0.2861395
4: 4 2 1 0.8304476
5: 5 2 2 1.2834910
where the value of time 2 in group 2 was changed.其中第 2 组中时间 2 的值已更改。 I suspected it is something along the lines of我怀疑它是类似的东西
dat[,val:=val[max(time)==2]*2, by=group]
but this would not work.但这行不通。 Because I want to apply the calculation to a different time point than the one I am subsetting on, I felt this can't be done in i
but I would not know how to do it instead.因为我想将计算应用到与我要设置子集的时间点不同的时间点,所以我觉得这不能在i
完成,但我不知道该怎么做。
Based on my previous answer (before edit) and on @Axeman's one, you could do the following根据我之前的回答(编辑前)和@Axeman 的回答,您可以执行以下操作
dat[, val2 := if(max(time) == 2) ifelse(time==2, 2*val, val) else val, group][]
## id group time val val2
## 1: 1 1 1 0.9148060 0.9148060
## 2: 2 1 2 0.9370754 0.9370754
## 3: 3 1 3 0.2861395 0.2861395
## 4: 4 2 1 0.8304476 0.8304476
## 5: 5 2 2 0.6417455 1.2834910
and replace the 2*val
by any function you would like.并用您想要的任何函数替换2*val
。
Like this:像这样:
dat[, val := val*(1 + (time==2 & max(time)==2)), by=group]
## id group time val
## 1: 1 1 1 0.9148060
## 2: 2 1 2 0.9370754
## 3: 3 1 3 0.2861395
## 4: 4 2 1 0.8304476
## 5: 5 2 2 1.2834910
The data is sorted by time, so we can join on the last row per group and edit iff it meets the criterion:数据按时间排序,因此我们可以在每组的最后一行加入并在满足条件的情况下进行编辑:
dat[.(unique(group)), on=.(group), mult="last",
val := if (time == 2) val*2 else val
, by=.EACHI]
We can use if
/ else
because mult="last"
(and nomatch=NA
) guarantees that time
has a length of 1. (This contrasts with the other two answers where the full time vector for each group is handled.)我们可以使用if
/ else
因为mult="last"
(和nomatch=NA
)保证time
长度为 1。(这与处理每个组的全时间向量的其他两个答案形成对比。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.