简体   繁体   English

不同的 set.seed 每次运行在 R

[英]Different set.seed each run in R

I want to "measure" which Regression Method is more robust to the outliers.我想“测量”哪种回归方法对异常值更稳健。

For this, I sum the variances of model coefficients.为此,我将 model 系数的方差相加。 Each run, I generate data from t-distribution.每次运行,我都会从 t 分布中生成数据。 I set.seed Ten times to have Ten specific data.我 set.seed 十次以获得十个特定数据。

However, I also want to have Ten different seed each run.但是,我希望每次运行十个不同的种子。 So, in total, I will have 10 sums of the variances.所以,总的来说,我将有 10 个方差之和。 The code below is giving me one sum of the first (Ten different seed).下面的代码给了我第一个(十个不同的种子)的总和。

How can I do this?我怎样才能做到这一点?

#######################################
p <- 5
n <- 50
#######################################
FX <- function(seed, data) {
#for loops over a seed #
for (i in seed) {
set.seed(seed)  
# generating data from t-distribution #
x<- matrix(rt(n*p,1), ncol = p)
y<-rt(n,1)
dat=cbind(x,y)
data<-as.data.frame(dat)
# performing a regression model on the data #
lm1 <- lm(y ~ ., data=data)
lm.coefs <- coef(lm1)
            
lad1 <- lad(y ~ ., data=data, method="BR")
lad.coefs <- coef(lad1)
          }
# calculate variance of the coefficients # 
return(`attr<-`(cbind(lmm=var(lm.coefs), lad=var(lad.coefs)), "seed", seed))
}
#######################################
seeds <- 1:10  ## 10 set seed to have diffrent data set from t-distribution #
res <- lapply(seeds, FX, data=data) # 10 diffrent variance of 10 data/model
sov <- t(sapply(res, colSums)) # put them in matrix
colSums(sov) # sum of 10 varainnces for each model.

   

Here is something closer to your intended results.这是更接近您预期结果的内容。 The code below fixes a key issues from your original code.下面的代码修复了原始代码中的一个关键问题。 It was not clear on what data was intended to be returned from the function.目前尚不清楚打算从 function 返回什么数据。

  1. This creates a vector of seeds numbers inside the function这将在 function 内创建一个种子编号向量

  2. This also creates a vector to inside the function to store the value of the variance of coefficients for each iteration of the loop.这还会在 function 内部创建一个向量,以存储循环每次迭代的系数方差值。 (not sure if is what you want). (不确定是否是您想要的)。

  3. I needed to comment out the lad function since I do not know which package this is from.我需要注释掉lad function 因为我不知道这是来自哪个 package。 (you would need to follow 2 from above to add this back in. (您需要按照上面的 2 重新添加它。

  4. Some general clean of the code对代码进行一些一般性的清理

    p <- 5 n <- 50 FX <- function(seed, data) { #for loops over a seed # #Fixes the starting seed issue startingSeed <- (seed-1)*10 +1 seeds <- seq( startingSeed, startingSeed+9) #create vector to store results from loop iteration lm.coefs <- vector(mode="numeric", length=10) index <- 1 for (i in seeds) { set.seed(i) # generating data from t-distribution # x<- matrix(rt(n*p,1), ncol = p) y<-rt(n,1) data<-data.frame(x, y) # performing a regression model on the data # lm1 <- lm(y ~., data=data) lm.coefs[index] <- var(coef(lm1)) # lad1 <- lad(y ~., data=data, method="BR") # lad.coefs <- coef(lad1) index <- index +1 } # calculate variance of the coefficients # return(`attr<-`(cbind(lmm=lm.coefs), "seed", seed)) } seeds <- 1:10 ## 10 set seed to have diffrent data set from t-distribution # res <- lapply(seeds, FX, data=data) # 10 diffrent variance of 10 data/model sov <- t(sapply(res, colSums)) # put them in matrix colSums(sov) # sum of 10 varainnces for each model.

Hope this provides the answer or at least guidance to solve your problem.希望这能提供答案或至少提供解决您问题的指导。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM