简体   繁体   English

如何在 R 中运行蒙特卡罗模拟以进行多重回归?

[英]How to run a montecarlo simulation for multple regression in R?

I want to run a monte carlo simulation for a multiple regression model that predicts mpg and then evaluate how many times each car has a better performance (lower mpg) than the other.我想为预测 mpg 的多元回归 model 运行蒙特卡洛模拟,然后评估每辆车的性能比另一辆车更好的次数(更低的 mpg)。 This is what I got so far这是我到目前为止得到的

library(pacman)
pacman::p_load(data.table, fixest, stargazer, dplyr, magrittr)

df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
fit$coefficients[1]

beta_0 = fit$coefficients[1] # Intercept 
beta_1 = fit$coefficients[2] # Slope (cyl)
beta_2 = fit$coefficients[3] # slope (hp)
set.seed(1)  # Seed
n = 1000     # Sample size
M = 500      # Number of experiments/iterations

## Storage 
slope_DT <- rep(0,M)
slope_DT_2 <- rep(0,M)
intercept_DT <- rep(0,M)

## Begin Monte Carlo

for (i in 1:M){ #  M is the number of iterations
  
  # Generate data
  U_i = rnorm(n, mean = 0, sd = 2) # Error
  X_i = rnorm(n, mean = 5, sd = 5) # Independent variable
  Y_i = beta_0 + beta_1*X_i + beta_2*X_i +U_i  # Dependent variable
  
  # Formulate data.table
  data_i = data.table(Y = Y_i, X = X_i)
  
  # Run regressions
  ols_i <- fixest::feols(data = data_i, Y ~ X)
  
  # Extract slope coefficient and save
  slope_DT_2[i] <- ols_i$coefficients[3]
  slope_DT[i] <- ols_i$coefficients[2]
  intercept_DT[i] <- ols_i$coefficients[1]
  
}


# Summary statistics
estimates_DT <- data.table(beta_2 = slope_DT_2,beta_1 = slope_DT, beta_0 = intercept_DT)

This code does not create any coefficients for hp I want to know how to add coefficients to the model and then predict results and test how many times one car has lower mpg than the other.此代码不会为hp创建任何系数我想知道如何将系数添加到 model,然后预测结果并测试一辆车的 mpg 比另一辆车低多少次。 For example how many times Mazda RX4 has a lower predicted mpg than Datsun 710. Some idea on how can make this work?例如,马自达 RX4 的预测 mpg 比 Datsun 710 低多少倍。关于如何使这项工作发挥作用的一些想法? Thank you谢谢

Like ive noted in my comment, you shuld use two independent variables.就像我在评论中指出的那样,您应该使用两个自变量。 Moreover, I would like to sugest you the lapply -function, which makes code more short, since you don't need the initialization/Storage part.此外,我想向您lapply ,它使代码更短,因为您不需要初始化/存储部分。

estimates_DT <- do.call("rbind",lapply(1:M, function(i) {
  # Generate data
  U_i = rnorm(n, mean = 0, sd = 2) # Error
  X_i_1 = rnorm(n, mean = 5, sd = 5) # First independent variable
  X_i_2 = rnorm(n, mean = 5, sd = 5) #Second ndependent variable
  Y_i = beta_0 + beta_1*X_i_1 + beta_2*X_i_2 + U_i  # Dependent variable

  # Formulate data.table
  data_i = data.table(Y = Y_i, X1 = X_i_1, X2 = X_i_2)
  
  # Run regressions
  ols_i <- fixest::feols(data = data_i, Y ~ X1 + X2)  
  ols_i$coefficients
}))

estimates_DT <- setNames(data.table(estimates_DT),c("beta_0","beta_1","beta_2"))

To compare the predictions of the two cars, define the following function taking the two carnames you want to comapre as arguemnt:要比较两辆车的预测,请定义以下 function,将您想要共同的两个车名作为参数:

compareCarEstimations <- function(carname1="Mazda RX4",carname2="Datsun 710") {
  car1data <- mtcars[rownames(mtcars) == carname1,c("cyl","hp")]
  car2data <- mtcars[rownames(mtcars) == carname2,c("cyl","hp")]
  
  predsCar1 <- estimates_DT[["beta_0"]] + car1data$cyl*estimates_DT[["beta_1"]]+car1data$hp*estimates_DT[["beta_2"]]
  predsCar2 <- estimates_DT[["beta_0"]] + car2data$cyl*estimates_DT[["beta_1"]]+car2data$hp*estimates_DT[["beta_2"]]
  
  list(
    car1LowerCar2 = sum(predsCar1 < predsCar2),
    car2LowerCar1 = sum(predsCar1 >= predsCar2)
  )
}

Make sure the names provided as argument are valid names, eg are in rownames(mtcars) .确保作为参数提供的名称是有效名称,例如在rownames(mtcars)中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM