简体   繁体   English

R中的蒙特卡罗模拟

[英]Monte Carlo simulation in R

I am trying to simulate data (Y) from an AR(1) model with rho=0.7. 我试图用rho = 0.7的AR(1)模型模拟​​数据(Y)。 Then I will use this data to run a regression of Y on an intercept ( by so doing the parameter estimate becomes the mean of Y), then test the null hypothesis of the coefficient being less than or equal to zero ( alternative is greater than 0) using robust standard errors. 然后我将使用这些数据在截距上运行Y的回归(通过这样做,参数估计变为Y的平均值),然后测试系数的零假设小于或等于零(替代大于0) )使用强大的标准错误。 I want to run a Monte Carlo simulation of this hypothesis using 2000 replications for different lag values. 我想对不同的滞后值使用2000次重复进行蒙特卡罗模拟该假设。 the purpose is to show the finite sample performance of the Newey West estimator as the lag changes. 目的是显示Newey West估计的有限样本性能随着滞后的变化。 so this is how I began 所以这就是我的开始

A<-array(0, dim=c(2000,1))
for(i in 1:2000){
  y_new<-arima.sim(model=list(ar=0.7), n=50, mean=0,sd=1)
  reg<-lm(y_new~1)
  ad<-coeftest(reg, alternative="greater", vcov=NeweyWest(reg, lag=1, prewhite=FALSE))
  A[i]<-ad[,3]
}

My question: is the code above the right way of doing this kind of simulation? 我的问题:上面的代码是采用这种模拟的正确方法吗? And if it is, how can I get a code to repeat this process for different lag values in the HAC test. 如果是,如何在HAC测试中获取代码以针对不同的滞后值重复此过程。 I want to run the test each time increasing the lag by 1, thus I will be doing this 50 times for lags 1,2,3,4......,50, each time storing the 2000 simulated test statistics in a vector with different names. 我希望每次将延迟增加1时运行测试,因此我将对滞后1,2,3,4 ......,50执行50次,每次将2000次模拟测试统计数据存储在矢量与不同的名字。 calculate rejection probabilities for the test statistic (sig. level =0,05, using the critical value of 1.645) for each case and plot them(rejection probabilities) against the various lag values. 计算每个案例的检验统计量(sig.level = 0,05,使用临界值1.645)的拒绝概率,并根据各种滞后值绘制它们(拒绝概率)。 Please help 请帮忙

Because you didn't mention the possible purpose of the simulation, it is hard to tell whether it is the right way. 因为你没有提到模拟的可能目的,所以很难说它是否是正确的方法。

You save a lot of time by computing 50 test statistics for each simulated sample, instead of repeating the simulation 2000 times for each lag (that is, the number of simulation is 2000*50). 通过计算每个模拟样本的50个测试统计数据可以节省大量时间,而不是每个滞后重复模拟2000次(即模拟次数为2000 * 50)。

Much better format of doing simulation is 更好的模拟格式是

library(AER)
library(dplyr)
lags <- 1:50
nreps <- 2000

sim <- function (){
  ynew <- arima.sim(model = list(ar=0.7), n=50, mean=0, sd=1)
  reg <- lm(ynew ~ 1 )
  s <- rep(NA, 50)
  for(i in lags){    
    ad <- coeftest(reg, alternative="greater", vcov=NeweyWest(reg, lag = i, prewhite=FALSE))
    s[i] <- ad[ ,4]
  }
  s
}

Following code stores simulation results in a data.frame 以下代码将模拟结果存储在data.frame

 result <- lapply(1:nreps, function(i)data.frame(simulation = i, lag = lags, pvalues = sim())) %>%
 rbind_all

From your vague description, I extrapolate what you want looks something like 从你模糊的描述中,我推断出你想要的东西

library(ggplot2)
result %>% 
  group_by(lag) %>% 
  summarize(rejectfreq = mean(pvalues > 0.05)) %>% 
  ggplot(., aes(lag, rejectfreq)) + geom_line()+
  coord_cartesian(ylim = c(0,1)) +
  scale_y_continuous(breaks=seq(0, 1, by=0.1))

在此输入图像描述

Although the figure was created using only 100 simulations, it is evident that the choice of the lags in Newey-West wouldn't matter much when the disturbance terms are iid 虽然这个数字仅使用100次模拟创建,但很显然,当扰动条件为iid时,选择Newey-West中的滞后并不重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM