简体   繁体   English

将重复测量混合模型公式从SAS转换为R.

[英]Converting Repeated Measures mixed model formula from SAS to R

There are several questions and posts about mixed models for more complex experimental designs, so I thought this more simple model would help other beginners in this process as well as I. 关于更复杂的实验设计的混合模型有几个问题和帖子,所以我认为这个更简单的模型将有助于其他初学者以及我。

So, my question is I would like to formulate a repeated measures ancova in R from sas proc mixed procedure: 所以,我的问题是我想从sas proc混合程序中在R中制定重复测量ancova:

proc mixed data=df1;
FitStatistics=akaike
class GROUP person day;
model Y = GROUP X1 / solution alpha=.1 cl;
repeated / type=cs subject=person group=GROUP;
lsmeans GROUP;
run;

Here is the SAS output using the data created in R (below): 以下是使用R(下面)中创建的数据的SAS输出:

.           Effect       panel    Estimate       Error      DF    t Value    Pr > |t|     Alpha       Lower       Upper
            Intercept              -9.8693      251.04       7      -0.04      0.9697       0.1     -485.49      465.75
            panel        1         -247.17      112.86       7      -2.19      0.0647       0.1     -460.99    -33.3510
            panel        2               0           .       .        .         .             .           .           .
            X1                     20.4125     10.0228       7       2.04      0.0811       0.1      1.4235     39.4016

Below is how I formulated the model in R using 'nlme' package, but am not getting similar coefficient estimates: 下面是我使用'nlme'包在R中制定模型的方法,但没有得到类似的系数估计:

## create reproducible example fake panel data set:
set.seed(94); subject.id = abs(round(rnorm(10)*10000,0))

set.seed(99); sds = rnorm(10,15,5);means = 1:10*runif(10,7,13);trends = runif(10,0.5,2.5)

this = NULL; set.seed(98)
for(i in 1:10) { this = c(this,rnorm(6, mean = means[i], sd = sds[i])*trends[i]*1:6)}
set.seed(97)
that = sort(rep(rnorm(10,mean = 20, sd = 3),6))

df1 = data.frame(day = rep(1:6,10), GROUP = c(rep('TEST',30),rep('CONTROL',30)),
                 Y = this,
                 X1 = that,
                 person = sort(rep(subject.id,6)))

## use package nlme
require(nlme)

## run repeated measures mixed model using compound symmetry covariance structure:
summary(lme(Y ~ GROUP + X1, random = ~ +1 | person,
            correlation=corCompSymm(form=~day|person), na.action = na.exclude,
            data = df1,method='REML'))

Now, the output from R, which I now realize is similar to the output from lm() : 现在,R的输出,我现在意识到类似于lm()的输出:

                Value Std.Error DF    t-value p-value
(Intercept) -626.1622  527.9890 50 -1.1859379  0.2413
GROUPTEST   -101.3647  156.2940  7 -0.6485518  0.5373
X1            47.0919   22.6698  7  2.0772934  0.0764

I believe I'm close as to the specification, but not sure what piece I'm missing to make the results match (within reason..). 我相信我对规范很接近,但不确定我缺少什么来使结果匹配(在合理范围内......)。 Any help would be appreciated! 任何帮助,将不胜感激!

UPDATE: Using the code in the answer below, the R output becomes: 更新:使用下面答案中的代码,R输出变为:

> summary(model2)

Scroll to bottom for the parameter estimates -- look! 滚动到底部查看参数估计 - 看! identical to SAS. 与SAS相同。

Linear mixed-effects model fit by REML
 Data: df1 
      AIC      BIC   logLik
  776.942 793.2864 -380.471

Random effects:
 Formula: ~GROUP - 1 | person
 Structure: Diagonal
        GROUPCONTROL GROUPTEST Residual
StdDev:      184.692  14.56864 93.28885

Correlation Structure: Compound symmetry
 Formula: ~day | person 
 Parameter estimate(s):
         Rho 
-0.009929987 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | GROUP 
 Parameter estimates:
    TEST  CONTROL 
1.000000 3.068837

Fixed effects: Y ~ GROUP + X1 

                Value Std.Error DF    t-value p-value
(Intercept)   -9.8706 251.04678 50 -0.0393178  0.9688
GROUPTEST   -247.1712 112.85945  7 -2.1900795  0.0647
X1            20.4126  10.02292  7  2.0365914  0.0811

Please try below: 请尝试以下:

model1 <- lme(
  Y ~ GROUP + X1,
  random = ~ GROUP | person,
  correlation = corCompSymm(form = ~ day | person),
  na.action = na.exclude, data = df1, method = "REML"
)
summary(model1)

I think random = ~ groupvar | subjvar 我认为random = ~ groupvar | subjvar random = ~ groupvar | subjvar option with R lme provides similar result of repeated / subject = subjvar group = groupvar option with SAS/MIXED in this case. random = ~ groupvar | subjvar与选项R lme提供的类似的结果repeated / subject = subjvar group = groupvar与选项SAS/MIXED在这种情况下。

Edit: 编辑:

SAS/MIXED SAS /混合

SAS / MIXED协方差矩阵

R (a revised model2) R(修改后的模型2)

model2 <- lme(
  Y ~ GROUP + X1,
  random = list(person = pdDiag(form = ~ GROUP - 1)),
  correlation = corCompSymm(form = ~ day | person),
  weights = varIdent(form = ~ 1 | GROUP),
  na.action = na.exclude, data = df1, method = "REML"
)
summary(model2)

R协方差矩阵

So, I think these covariance structures are very similar (σ g1 = τ g 2 + σ 1 ). 所以,我认为这些协方差结构非常相似(σG1 2 +σ1)。

Edit 2: 编辑2:

Covariate estimates (SAS/MIXED): 协变量估计(SAS / MIXED):

Variance            person          GROUP TEST        8789.23
CS                  person          GROUP TEST         125.79
Variance            person          GROUP CONTROL       82775
CS                  person          GROUP CONTROL       33297

So 所以

TEST group diagonal element
  = 125.79 + 8789.23
  = 8915.02
CONTROL group diagonal element
  = 33297 + 82775
  = 116072

where diagonal element = σ k1 + σ k 2 . 其中对角元素=σK1 +σK 2。

Covariate estimates (R lme): 协变量估计(R lme):

Random effects:
 Formula: ~GROUP - 1 | person
 Structure: Diagonal
        GROUP1TEST GROUP2CONTROL Residual
StdDev:   14.56864       184.692 93.28885

Correlation Structure: Compound symmetry
 Formula: ~day | person 
 Parameter estimate(s):
         Rho 
-0.009929987 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | GROUP 
 Parameter estimates:
   1TEST 2CONTROL 
1.000000 3.068837 

So 所以

TEST group diagonal element
  = 14.56864^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + 93.28885^2
  = 8913.432
CONTROL group diagonal element
  = 184.692^2  + (3.068837^0.5 * 93.28885 * -0.009929987) + (3.068837 * 93.28885)^2
  = 116070.5

where diagonal element = τ g 2 + σ 1 + σ g 2 . 其中对角元素=τ 2 +σ1 2-。

Oooh, this is going to be a tricky one, and if it's even possible using standard nlme functions, is going to take some serious study of Pinheiro/Bates. 哦,这将是一个棘手的问题,如果甚至可能使用标准的nlme函数,将会对Pinheiro / Bates进行一些认真的研究。

Before you spend the time doing that though, you should make absolutely sure that this is exact model you need. 在你花时间做这件事之前,你应该确保这是你需要的确切模型。 Perhaps there's something else that might fit the story of your data better. 也许还有一些其他东西可能更适合您的数据故事。 Or maybe there's something R can do more easily that is just as good, but not quite the same. 或者也许R可以更容易地做到这一点同样好,但不完全相同。

First, here's my take on what you're doing in SAS with this line: 首先,这是我对使用此行在SAS中所做的工作的看法:

repeated / type=cs subject=person group=GROUP;

This type=cs subject=person is inducing correlation between all the measurements on the same person, and that correlation is the same for all pairs of days. 这种type=cs subject=person引起同一个人的所有测量之间的相关性,并且所有天对的相关性是相同的。 The group=GROUP is allowing the correlation for each group to be different. group=GROUP允许每个组的相关性不同。

In contrast, here's my take on what your R code is doing: 相比之下,这是我对你的R代码所做的事情的看法:

random = ~ +1 | person,
correlation=corCompSymm(form=~day|person)

This code is actually adding almost the same effect in two different ways; 这段代码实际上以两种不同的方式添加了几乎相同的效果; the random line is adding a random effect for each person, and the correlation line is inducing correlation between all the measurements on the same person. random线为每个人添加随机效应,并且correlation线引起同一个人的所有测量之间的correlation性。 However, these two things are almost identical; 但是,这两件事几乎完全相同; if the correlation is positive, you get the exact same result by including either of them. 如果相关性为正,则通过包含其中任何一个来获得完全相同的结果。 I'm not sure what happens when you include both, but I do know that only one is necessary. 我不确定当你包括两者时会发生什么,但我知道只有一个是必要的。 Regardless, this code has the same correlation for all individuals, it's not allowing each group to have their own correlation. 无论如何,此代码对所有个体具有相同的相关性,它不允许每个组具有自己的相关性。

To let each group have their own correlation, I think you have to build a more complicated correlation structure up out of two different pieces; 为了让每个组都有自己的相关性,我认为你必须从两个不同的部分构建一个更复杂的相关结构; I've never done this but I'm pretty sure I remember Pinheiro/Bates doing it. 我从来没有这样做,但我很确定我记得Pinheiro / Bates这样做。

You might consider instead adding a random effect for person and then letting the variance be different for the different groups with weights=varIdent(form=~1|group) (from memory, check my syntax, please). 你可以考虑改为为人添加一个随机效果,然后让weights=varIdent(form=~1|group)的不同组的方差不同(来自内存,请检查我的语法)。 This won't quite be the same but tells a similar story. 这不会完全相同,但讲述了类似的故事。 The story in SAS is that the measurements on some individuals are more correlated than the measurements on other individuals. SAS中的故事是,某些人的测量值与其他人的测量值相关性更高。 Thinking about what that means, the measurements for individuals with higher correlation will be closer together than the measurements for individuals with lower correlation. 考虑到这意味着什么,具有较高相关性的个体的测量值将比具有较低相关性的个体的测量值更接近。 In contrast, the story in R is that the variability of measurements within individuals varies; 相比之下,R中的故事是个体内测量的可变性变化; thinking about that, measurements with higher variability with have lower correlation. 考虑到这一点,具有较高可变性的测量具有较低的相关性。 So they do tell similar stories, but come at it from opposite sides. 所以他们确实讲述了类似的故事,但从相反的方面来看。

It is even possible (but I would be surprised) that these two models end up being different parameterizations of the same thing. 甚至可能(但我会感到惊讶)这两个模型最终成为同一事物的不同参数化。 My intuition is that the overall measurement variability will be different in some way. 我的直觉是整体测量的可变性在某些方面会有所不同。 But even if they aren't the same thing, it would be worth writing out the parameterizations just to be sure you understand them and to make sure that they are appropriately describing the story of your data. 但即使它们不是同一个东西,也值得写出参数化以确保您理解它们并确保它们正确描述您的数据故事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM