简体   繁体   中英

Different set.seed each run in R

I want to "measure" which Regression Method is more robust to the outliers.

For this, I sum the variances of model coefficients. Each run, I generate data from t-distribution. I set.seed Ten times to have Ten specific data.

However, I also want to have Ten different seed each run. So, in total, I will have 10 sums of the variances. The code below is giving me one sum of the first (Ten different seed).

How can I do this?

#######################################
p <- 5
n <- 50
#######################################
FX <- function(seed, data) {
#for loops over a seed #
for (i in seed) {
set.seed(seed)  
# generating data from t-distribution #
x<- matrix(rt(n*p,1), ncol = p)
y<-rt(n,1)
dat=cbind(x,y)
data<-as.data.frame(dat)
# performing a regression model on the data #
lm1 <- lm(y ~ ., data=data)
lm.coefs <- coef(lm1)
            
lad1 <- lad(y ~ ., data=data, method="BR")
lad.coefs <- coef(lad1)
          }
# calculate variance of the coefficients # 
return(`attr<-`(cbind(lmm=var(lm.coefs), lad=var(lad.coefs)), "seed", seed))
}
#######################################
seeds <- 1:10  ## 10 set seed to have diffrent data set from t-distribution #
res <- lapply(seeds, FX, data=data) # 10 diffrent variance of 10 data/model
sov <- t(sapply(res, colSums)) # put them in matrix
colSums(sov) # sum of 10 varainnces for each model.

   

Here is something closer to your intended results. The code below fixes a key issues from your original code. It was not clear on what data was intended to be returned from the function.

  1. This creates a vector of seeds numbers inside the function

  2. This also creates a vector to inside the function to store the value of the variance of coefficients for each iteration of the loop. (not sure if is what you want).

  3. I needed to comment out the lad function since I do not know which package this is from. (you would need to follow 2 from above to add this back in.

  4. Some general clean of the code

    p <- 5 n <- 50 FX <- function(seed, data) { #for loops over a seed # #Fixes the starting seed issue startingSeed <- (seed-1)*10 +1 seeds <- seq( startingSeed, startingSeed+9) #create vector to store results from loop iteration lm.coefs <- vector(mode="numeric", length=10) index <- 1 for (i in seeds) { set.seed(i) # generating data from t-distribution # x<- matrix(rt(n*p,1), ncol = p) y<-rt(n,1) data<-data.frame(x, y) # performing a regression model on the data # lm1 <- lm(y ~., data=data) lm.coefs[index] <- var(coef(lm1)) # lad1 <- lad(y ~., data=data, method="BR") # lad.coefs <- coef(lad1) index <- index +1 } # calculate variance of the coefficients # return(`attr<-`(cbind(lmm=lm.coefs), "seed", seed)) } seeds <- 1:10 ## 10 set seed to have diffrent data set from t-distribution # res <- lapply(seeds, FX, data=data) # 10 diffrent variance of 10 data/model sov <- t(sapply(res, colSums)) # put them in matrix colSums(sov) # sum of 10 varainnces for each model.

Hope this provides the answer or at least guidance to solve your problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM