简体   繁体   English

data.table:组内存在值的计算条件

[英]data.table: calculation condition on presence of value within group

With some data有一些数据

library(data.table); set.seed(42)
dat <- data.table(id=1:5, group=c(1,1,1,2,2), time=c(1,2,3,1,2), val=runif(5))
> dat
   id group time       val
1:  1     1    1 0.9148060
2:  2     1    2 0.9370754
3:  3     1    3 0.2861395
4:  4     2    1 0.8304476
5:  5     2    2 0.6417455

and I would like to apply some calculation, say val*2 , to time point 2 only to those groups for which there is no third time point.我想应用一些计算,比如val*2 ,将时间点 2 应用到那些没有第三时间点的组。 The expected output is therefore因此,预期的输出是

> res
   id group time       val
1:  1     1    1 0.9148060
2:  2     1    2 0.9370754
3:  3     1    3 0.2861395
4:  4     2    1 0.8304476
5:  5     2    2 1.2834910

where the value of time 2 in group 2 was changed.其中第 2 组中时间 2 的值已更改。 I suspected it is something along the lines of我怀疑它是类似的东西

dat[,val:=val[max(time)==2]*2, by=group]

but this would not work.但这行不通。 Because I want to apply the calculation to a different time point than the one I am subsetting on, I felt this can't be done in i but I would not know how to do it instead.因为我想将计算应用到与我要设置子集的时间点不同的时间点,所以我觉得这不能在i完成,但我不知道该怎么做。

Based on my previous answer (before edit) and on @Axeman's one, you could do the following根据我之前的回答(编辑前)和@Axeman 的回答,您可以执行以下操作

dat[, val2 := if(max(time) == 2) ifelse(time==2, 2*val, val) else val, group][]
##     id group time       val      val2
##  1:  1     1    1 0.9148060 0.9148060
##  2:  2     1    2 0.9370754 0.9370754
##  3:  3     1    3 0.2861395 0.2861395
##  4:  4     2    1 0.8304476 0.8304476
##  5:  5     2    2 0.6417455 1.2834910

and replace the 2*val by any function you would like.并用您想要的任何函数替换2*val

Like this:像这样:

dat[, val := val*(1 + (time==2 & max(time)==2)), by=group]
##    id group time       val
## 1:  1     1    1 0.9148060
## 2:  2     1    2 0.9370754
## 3:  3     1    3 0.2861395
## 4:  4     2    1 0.8304476
## 5:  5     2    2 1.2834910

The data is sorted by time, so we can join on the last row per group and edit iff it meets the criterion:数据按时间排序,因此我们可以在每组的最后一行加入并在满足条件的情况下进行编辑:

dat[.(unique(group)), on=.(group), mult="last", 
  val := if (time == 2) val*2 else val
, by=.EACHI]

We can use if / else because mult="last" (and nomatch=NA ) guarantees that time has a length of 1. (This contrasts with the other two answers where the full time vector for each group is handled.)我们可以使用if / else因为mult="last" (和nomatch=NA )保证time长度为 1。(这与处理每个组的全时间向量的其他两个答案形成对比。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM