简体   繁体   English

卡在R中的包示例代码中-模拟数据以适合模型

[英]Stuck with package example code in R - simulating data to fit a model

I am trying to understand the function indeptCoxph in the spBayesSurv package. 我试图了解spBayesSurv软件包中的indeptCoxph函数。 This function fits a Bayesian proportional hazards model. 该函数符合贝叶斯比例风险模型。 I am getting a little stuck with understanding parts of the R code as well as the Cox model theory. 我对理解R代码的某些部分以及Cox模型理论有些困惑。

I am working on the authors' example (below). 我正在研究作者的示例(如下)。 They have first simulated survival time data and I am having trouble following their code for doing this. 他们首先模拟了生存时间数据,而我在遵循其代码时遇到了麻烦。 It seems to me that first they are simulating survival times from an exponential distribution with CDF F(t) = 1- exp(-lambda*t) except that their value for lambda is exp(sum(xi * betaT)) rather than just a constant. 在我看来,他们首先是根据CDF F(t)= 1- exp(-lambda * t)的指数分布来模拟生存时间,只是它们的lambda值为exp(sum(xi * betaT))而不是一个常数。 In order to simulate data, the parameter betaT is given a fixed constant value which is its true value and xi is the predictor data. 为了模拟数据,给参数betaT赋予一个固定的常数值,该常数是它的真实值,而xi是预测值数据。

Question 1-Is this definition/form of lambda due to the Cox Hazard model ? 问题1 –是由于Cox Hazard模型导致的lambda的定义/形式吗? In this example, are the authors making special assumptions about the survival distribution? 在此示例中,作者是否对生存分布做出特殊假设?

Question 2- I am stuck with understanding the following key piece of code which generates the survival time data(of course it relies on earlier code, given at the end): 问题2-我坚持理解下面的关键代码段,这些代码段会生成生存时间数据(当然,它依赖于最后给出的早期代码):

## Generate survival times t

u = pnorm(z);
t = rep(0, ntot);
for (i in 1:ntot){
t[i] = Finv(u[i], x[i]);
}
tTrue = t; #plot(x,t);

The function Finv(u,xi) gets the value of survival time t that satisfies F(t) = u, where I think xi is the predictor variable. 函数Finv(u,xi)获得满足F(t)= u的生存时间t的值,其中xi是预测变量。 I don't really understand why u has to come from the normal CDF. 我真的不明白为什么你必须来自普通的CDF。 They have generated "z" as a single draw from a multivariate normal distribution (with 3 components ), and u is the vector of Normal CDF values u = pnorm(z). 他们从多元正态分布(包含3个分量)中作为单次绘制生成了“ z”,并且u是正态CDF值的向量u = pnorm(z)。 Again, not sure why "u" has to be generated this way - would be really helpful if the relationship between u,z,t and lambda could be clarified. 同样,不确定为什么必须以这种方式生成“ u”-如果可以澄清u,z,t和lambda之间的关系,这将真的很有帮助。 The covariance matrix for "z" also is generated by the author from two row vectors s1, and s2 in the code - but its confusing what the role of s1,s2 would be if I were just fitting a model with survival time data "t" and predictor variable "x". 作者还从代码中的两个行向量s1和s2生成了“ z”的协方差矩阵-但如果我仅用生存时间数据“ t”拟合模型,则s1,s2的作用会令人困惑”和预测变量“ x”。

Authors' code: 作者代码:

###############################################################
# A simulated data: Cox PH
###############################################################

rm(list=ls())
library(survival)
library(spBayesSurv)
library(coda)
library(MASS)
## True parameters
betaT = c(-1);
theta1 = 0.98; theta2 = 100000;
## generate coordinates:
## npred is the # of locations for prediction
n = 100; npred = 30; ntot = n + npred;
ldist = 100; wdist = 40;
s1 = runif(ntot, 0, wdist); s2 = runif(ntot, 0, ldist);
s = rbind(s1,s2); #plot(s[1,], s[2,]);
## Covariance matrix
corT = matrix(1, ntot, ntot);
for (i in 1:(ntot-1)){
for (j in (i+1):ntot){
dij = sqrt(sum( (s[,i]-s[,j])^2 ));
corT[i,j] = theta1*exp(-theta2*dij);
corT[j,i] = theta1*exp(-theta2*dij);
}
}
## Generate x
x = runif(ntot,-1.5,1.5);
## Generate transformed log of survival times
z = mvrnorm(1, rep(0, ntot), corT);
## The CDF of Ti: Lambda(t) = t;
Fi = function(t, xi){
res = 1-exp(-t*exp(sum(xi*betaT)));
res[which(t<0)] = 0;
res
}
## The pdf of Ti:
fi = function(t, xi){
res=(1-Fi(t,xi))*exp(sum(xi*betaT));
res[which(t<0)] = 0;
res
}
#integrate(function(x) fi(x, 0), -Inf, Inf)
## true plot
xx = seq(0, 10, 0.1)
#plot(xx, fi(xx, -1), "l", lwd=2, col=2)
#lines(xx, fi(xx, 1), "l", lwd=2, col=3)

## The inverse for CDF of Ti
Finvsingle = function(u, xi) {
res = uniroot(function (x) Fi(x, xi)-u, lower=0, upper=5000);
res$root
}
Finv = function(u, xi) {sapply(u, Finvsingle, xi)};

## Generate survival times t
u = pnorm(z);
t = rep(0, ntot);
for (i in 1:ntot){
t[i] = Finv(u[i], x[i]);
}
tTrue = t; #plot(x,t);

Actually, the data are generated in the framework of spatial copula Cox PH model. 实际上,数据是在空间copula Cox PH模型的框架中生成的。 It is helpful to read Section 4.1 of the supplemental material of Zhou et al. 阅读Zhou等人的补充材料的第4.1节将很有帮助 (2015) . (2015) As you are fitting non-spatial PH model, the data generating procedure can be sampled without the use of s1 and s2; 当您拟合非空间PH模型时,可以在不使用s1和s2的情况下采样数据生成过程。 see the new example at https://stats.stackexchange.com/questions/253368/bayesian-survival-analysis . 请参见https://stats.stackexchange.com/questions/253368/bayesian-survival-analysis中的新示例。

In this new example, f0oft(t) and S0oft(t) are baseline survival functions, respectively. 在这个新示例中, f0oft(t)S0oft(t)分别是基线生存函数。 Given the subject with covariates x, Sioft(t,x) and fioft(t,x) are the survival and density for that subject. 给定具有协变量x的对象, Sioft(t,x)fioft(t,x)是该对象的生存率和密度。 Finv(u,x) is the inverse function for Fioft(t,x)=1-Sioft(t,x) , that is, Finv(u,x) is the solution to Fioft(t,x)=u wrt t . Finv(u,x)Fioft(t,x)=1-Sioft(t,x)的反函数,即Finv(u,x)Fioft(t,x)=u wrt t

To generate the survival data, we can first generate the covariates: 要生成生存数据,我们首先可以生成协变量:

    x1 = rbinom(ntot, 1, 0.5); x2 = rnorm(ntot, 0, 1); X = cbind(x1, x2);

Given each covariate vector X , true survival time tT can be generated as 给定每个协变量向量X ,可以生成真实的生存时间tT

    u = runif(ntot);
    tT = rep(0, ntot);
    for (i in 1:ntot){
      tT[i] = Finv(u[i], X[i,]);
    }

Here the rationale behind is that if T|x ~ F(t,x), then F(T,x) ~ Uniform(0,1). 这里的基本原理是,如果T | x〜F(t,x),则F(T,x)〜Uniform(0,1)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM