简体   繁体   中英

Generate random data based on correlation matrix for multiple timesteps in R

I would like to simulate data for some cases (eg nPerson=1000 obversations) at some consecutive timesteps (eg ts = 3) for N intercorrelated variables (eg N=5).

The simulation should be based on a correlation matrix (corrMat, nrows=nPerson,.ncols = N). corrMat should be identical for all timesteps.

I already found out that the MASS package has a function to create random data fitting the constraints given by corrMat.

t1 <- mvrnorm(nPerson,mu=rep(0, N),Sigma=corrMat,empirical=T)

Now I would like to simulate t2 as a function of t1 and corrMat. The data of t2 therefore should correlate according to corrMat and they should also have same variance as the variables of t1.

One important constrained: for the intial values corrMat[i,i] = 1, for consequtive timesteps it should be posible, that corrMat[i,i] < 1, because each variable is depending on itsself a timestep before, but a perfect correlation is notintended.

Maybe there is a variance decomposition of the correlation matrix, that calculates an error variance for each of the n variables at the next time step, so that one could calculate the values at timestep t+1 as sum of the weighted correlations of the variables at timestep t and then adding a random error,distributed according to the error variance (with mean of error = 0) that replicates the correlation matrix again at t+1.

Assuming normal errors:

getRand <- function (range) {
  return (rnorm(1,mean=0, sd=range)  )
}

That the (very simplified) code for the i-th variable x_i:

x_i[t+1] = 0 
for (j:1..N) {
  x_i[t+1] = x_i[t+1] + corrMat[i,j]  * x_j[t] 
}
x_i[t+1] = x_i[t+1] + getRand(sdErr)

So the question would be more specific: how to calculate sdErr?

For simplification I try to assume, that the variance for all variables should be 1.

Thank you for any hint, how to get one step further!

I will do a mathematical formulation of the problem to stats.stackexchange.com, as mikeck suggested to discuss details of the correlation problems more in depth.

I still am interested in finding a geneal formula to calculate sdErr to use it in the calculation of x_i[t+1].

But meanwhile I found a useful practical solution to the specific question "how to calculate sdErr?" without a formula for sdErr:

(1) simply calculate all variables WITHOUT errors (according to the equation above).

(2) calculate variances of the new variables

(3) calculate (for each i) differences var(x_i[t]) - var(x_i[t+1]) = sdErr ^ 2 So this sdErr can be added to each variable for each new observation. This should lead to observations at t+1 which at least have the same variances as the observations in t.

Details concercing the question, if the model definition is adequate, will be part of another post.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM