[英]How to run a montecarlo simulation for multple regression in R?
I want to run a monte carlo simulation for a multiple regression model that predicts mpg and then evaluate how many times each car has a better performance (lower mpg) than the other.我想为预测 mpg 的多元回归 model 运行蒙特卡洛模拟,然后评估每辆车的性能比另一辆车更好的次数(更低的 mpg)。 This is what I got so far
这是我到目前为止得到的
library(pacman)
pacman::p_load(data.table, fixest, stargazer, dplyr, magrittr)
df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
fit$coefficients[1]
beta_0 = fit$coefficients[1] # Intercept
beta_1 = fit$coefficients[2] # Slope (cyl)
beta_2 = fit$coefficients[3] # slope (hp)
set.seed(1) # Seed
n = 1000 # Sample size
M = 500 # Number of experiments/iterations
## Storage
slope_DT <- rep(0,M)
slope_DT_2 <- rep(0,M)
intercept_DT <- rep(0,M)
## Begin Monte Carlo
for (i in 1:M){ # M is the number of iterations
# Generate data
U_i = rnorm(n, mean = 0, sd = 2) # Error
X_i = rnorm(n, mean = 5, sd = 5) # Independent variable
Y_i = beta_0 + beta_1*X_i + beta_2*X_i +U_i # Dependent variable
# Formulate data.table
data_i = data.table(Y = Y_i, X = X_i)
# Run regressions
ols_i <- fixest::feols(data = data_i, Y ~ X)
# Extract slope coefficient and save
slope_DT_2[i] <- ols_i$coefficients[3]
slope_DT[i] <- ols_i$coefficients[2]
intercept_DT[i] <- ols_i$coefficients[1]
}
# Summary statistics
estimates_DT <- data.table(beta_2 = slope_DT_2,beta_1 = slope_DT, beta_0 = intercept_DT)
This code does not create any coefficients for hp
I want to know how to add coefficients to the model and then predict results and test how many times one car has lower mpg than the other.此代码不会为
hp
创建任何系数我想知道如何将系数添加到 model,然后预测结果并测试一辆车的 mpg 比另一辆车低多少次。 For example how many times Mazda RX4 has a lower predicted mpg than Datsun 710. Some idea on how can make this work?例如,马自达 RX4 的预测 mpg 比 Datsun 710 低多少倍。关于如何使这项工作发挥作用的一些想法? Thank you
谢谢
Like ive noted in my comment, you shuld use two independent variables.就像我在评论中指出的那样,您应该使用两个自变量。 Moreover, I would like to sugest you the
lapply
-function, which makes code more short, since you don't need the initialization/Storage part.此外,我想向您
lapply
,它使代码更短,因为您不需要初始化/存储部分。
estimates_DT <- do.call("rbind",lapply(1:M, function(i) {
# Generate data
U_i = rnorm(n, mean = 0, sd = 2) # Error
X_i_1 = rnorm(n, mean = 5, sd = 5) # First independent variable
X_i_2 = rnorm(n, mean = 5, sd = 5) #Second ndependent variable
Y_i = beta_0 + beta_1*X_i_1 + beta_2*X_i_2 + U_i # Dependent variable
# Formulate data.table
data_i = data.table(Y = Y_i, X1 = X_i_1, X2 = X_i_2)
# Run regressions
ols_i <- fixest::feols(data = data_i, Y ~ X1 + X2)
ols_i$coefficients
}))
estimates_DT <- setNames(data.table(estimates_DT),c("beta_0","beta_1","beta_2"))
To compare the predictions of the two cars, define the following function taking the two carnames you want to comapre as arguemnt:要比较两辆车的预测,请定义以下 function,将您想要共同的两个车名作为参数:
compareCarEstimations <- function(carname1="Mazda RX4",carname2="Datsun 710") {
car1data <- mtcars[rownames(mtcars) == carname1,c("cyl","hp")]
car2data <- mtcars[rownames(mtcars) == carname2,c("cyl","hp")]
predsCar1 <- estimates_DT[["beta_0"]] + car1data$cyl*estimates_DT[["beta_1"]]+car1data$hp*estimates_DT[["beta_2"]]
predsCar2 <- estimates_DT[["beta_0"]] + car2data$cyl*estimates_DT[["beta_1"]]+car2data$hp*estimates_DT[["beta_2"]]
list(
car1LowerCar2 = sum(predsCar1 < predsCar2),
car2LowerCar1 = sum(predsCar1 >= predsCar2)
)
}
Make sure the names provided as argument are valid names, eg are in rownames(mtcars)
.确保作为参数提供的名称是有效名称,例如在
rownames(mtcars)
中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.