繁体   English   中英

重复ID的基线变化

[英]Change from baseline for repeated ids

例如,

> set.seed(1)
 df1 <- data.frame(ID = c(rep(c(rep(1,3), rep(2,3)),2),rep(c(rep(3,3), rep(4,3)),2)),
                     Day=rep(c(1,2,3),8))
 df2 <- data.frame(measure = c(rep("mean",6),rep("median",6),rep("mean",6),rep("median",6)),
                     val=sample(1:24,24))

 data <- cbind(df1,df2)

> data

    ID Day measure val
1   1   1    mean   7
2   1   2    mean   9
3   1   3    mean  13
4   2   1    mean  20
5   2   2    mean   5
6   2   3    mean  18
7   1   1  median  19
8   1   2  median  12
9   1   3  median  11
10  2   1  median   1
11  2   2  median   3
12  2   3  median  14
13  3   1    mean  23
14  3   2    mean  21
15  3   3    mean   8
16  4   1    mean  16
17  4   2    mean   6
18  4   3    mean  24
19  3   1  median  22
20  3   2  median   4
21  3   3  median  17
22  4   1  median  15
23  4   2  median   2
24  4   3  median  10

我想创建另一个变量来测量每个ID中每个度量从第1天开始的变化

    ID Day measure val change
1   1   1    mean   7    0
2   1   2    mean   9    2
3   1   3    mean  13    6
4   2   1    mean  20    0
5   2   2    mean   5  -15
6   2   3    mean  18   -2
7   1   1  median  19    0
8   1   2  median  12   -7
9   1   3  median  11   -8
10  2   1  median   1    0
11  2   2  median   3    2
12  2   3  median  14   13
13  3   1    mean  23    0
14  3   2    mean  21   -2
15  3   3    mean   8   -15
16  4   1    mean  16    0
17  4   2    mean   6   -10
18  4   3    mean  24    8
19  3   1  median  22    0
20  3   2  median   4   -18
21  3   3  median  17   -5
22  4   1  median  15    0
23  4   2  median   2   -13
24  4   3  median  10   -5

我一直在尝试修改使用长格式数据计算基线变化中的代码,但我的数据集中有重复的度量。

我们可以使用data.table来创建“更改”列。 将'data.frame'转换为'data.table'( setDT(data) ),按'ID','measure'分组,我们计算'val'和'day'之间的差值'val'对应'Day'到创造'变化'。

library(data.table)
setDT(data)[, change:= val-val[Day==1L], by = .(ID, measure)]
data
#    ID Day measure val change
# 1:  1   1    mean   7      0
# 2:  1   2    mean   9      2
# 3:  1   3    mean  13      6
# 4:  2   1    mean  20      0
# 5:  2   2    mean   5    -15
# 6:  2   3    mean  18     -2
# 7:  1   1  median  19      0
# 8:  1   2  median  12     -7
# 9:  1   3  median  11     -8
#10:  2   1  median   1      0
#11:  2   2  median   3      2
#12:  2   3  median  14     13
#13:  3   1    mean  23      0
#14:  3   2    mean  21     -2
#15:  3   3    mean   8    -15
#16:  4   1    mean  16      0
#17:  4   2    mean   6    -10
#18:  4   3    mean  24      8
#19:  3   1  median  22      0
#20:  3   2  median   4    -18
#21:  3   3  median  17     -5
#22:  4   1  median  15      0
#23:  4   2  median   2    -13
#24:  4   3  median  10     -5

使用dplyr的类似选项是

library(dplyr)
data %>% 
   group_by(ID, measure) %>%
   mutate(change = val- val[Day==1L])

或者如果订购了'Day'列,则使用带有avebase R选项

 data$change <- with(data, val-ave(val, ID, measure, FUN=function(x) head(x,1)))

或者如果列是有序的,则不进行分组的另一个base R选项

 data$change <- with(data, {i <- Day==1L; val-(val*i)[val*i>0][cumsum(i)] }) 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM