简体   繁体   English

R 中的自举/蒙特卡罗模拟

[英]Bootstrapping/Monte Carlo Simulation in R

I am trying to follow this test:我正在尝试遵循此测试:

统计测试

Suppose I have the following data:假设我有以下数据:

set.seed(123)

active_MJO   <-c(6L, 2L, 11L, 20L, 62L, 15L, 2L, 51L, 58L, 100L, 45L, 44L, 49L, 
                86L, 28L, 1L, 1L, 40L, 79L, 99L, 86L, 50L, 9L, 78L, 45L, 100L, 
                77L, 44L, 45L, 93L)

inactive_MJO <-c(83L, 170L, 26L, 66L, 156L, 40L, 29L, 72L, 109L, 169L, 153L, 
               136L, 169L, 133L, 153L, 13L, 24L, 148L, 121L, 80L, 125L, 21L, 
               135L, 155L, 161L, 171L, 124L, 177L, 167L, 162L)

I dont know how to implement the above test in R.我不知道如何在 R 中实现上述测试。

I have tried the following but I am not sure if this is correct.我尝试了以下方法,但我不确定这是否正确。

sig.test <- function (x){
a <- sample(active_MJO)
b <- sample(inactive_MJO)
sum(a > b)
}

runs <- 1000
sim <- sum(replicate(runs,sig.test(dat))+1)/(runs+1)

I think the above is not correct.我认为上面的说法是不正确的。 Where can I put the 950/1000 condition?我在哪里可以放置 950/1000 条件?

Apologies, I am new to bootstrapping/Monte Carlo test.抱歉,我是引导/蒙特卡洛测试的新手。

I'll appreciate any help on this.我会很感激这方面的任何帮助。

Sincerely, Lyndz真诚的,林兹

First, it's important to note that they are sampling 30 frequency pairs .首先,重要的是要注意他们正在采样 30 个频率 Since it's bootstrapping, those samples will be with replacement.由于它是自举的,因此这些样本将被替换。

Then they compare the average active to average inactive.然后他们比较平均活跃和平均不活跃。 This is equivalent to:这相当于:

  1. comparing the sum of the active against the sum of the inactive from the 30 pairs, or比较 30 对中活跃的总和与非活跃的总和,或
  2. comparing the sum of the differences within each of the 30 pairs to zero.将 30 对中的每对中的差异之和与零进行比较。

They repeat the process 1000 times then compare the results of the 1000 comparisons to 950.他们重复该过程 1000 次,然后将 1000 次比较的结果与 950 次进行比较。

The following code performs #2:以下代码执行#2:

set.seed(123)

active_MJO   <-c(6L, 2L, 11L, 20L, 62L, 15L, 2L, 51L, 58L, 100L, 45L, 44L, 49L, 
                 86L, 28L, 1L, 1L, 40L, 79L, 99L, 86L, 50L, 9L, 78L, 45L, 100L, 
                 77L, 44L, 45L, 93L)
inactive_MJO <-c(83L, 170L, 26L, 66L, 156L, 40L, 29L, 72L, 109L, 169L, 153L, 
                 136L, 169L, 133L, 153L, 13L, 24L, 148L, 121L, 80L, 125L, 21L, 
                 135L, 155L, 161L, 171L, 124L, 177L, 167L, 162L)

diff_MJO <- active_MJO - inactive_MJO
sim <- sum(replicate(1e3, sum(sample(diff_MJO, 30, replace = TRUE)) > 0))

> sim
[1] 0

In this case, none of the 1000 replications resulted in an average active_MJO that was greater than the average inactive_MJO .在这种情况下,1000 次复制中没有一个导致平均active_MJO大于平均inactive_MJO This is unsurprising after plotting the histogram of sums of bootstrapped differences:在绘制自举差异之和的直方图后,这并不奇怪:

diff_MJO <- replicate(1e5, sum(sample(diff_MJO, 30, replace = TRUE)))
hist(diff_MJO)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM