相关系数的自举p值（重采样方法）

Question

I have this large datset (N = 300.000) and with a power analysis I came to the conclusion that I need only 250 observations to find a correlation if it's present. 我有这么大的数据集（N = 300.000），通过功效分析，得出的结论是，如果存在相关性，我仅需要250次观察即可找到相关性。

So, I want to use a bootstrap to pick 1000 samples of size n = 250 to find the range of p-values in these 1000 samples. 因此，我想使用自举选择1000个大小为n = 250的样本，以查找这1000个样本中p值的范围。 I am quite unfamiliar with the bootstrap method, but under here I gave an example on how far I got with the boot package. 我对Bootstrap方法并不熟悉，但是在这里我举了一个示例，介绍了我对boot软件包的了解。 I used the Iris dataset to illustrate. 我用虹膜数据集进行了说明。

My desired output is a histogram showing the frequency distribution of the 1000 obtained p-value values and the 95% confidence interval of possible p-values. 我想要的输出是一个直方图，显示了1000个获得的p值的频率分布以及可能的p值的95％置信区间。

Can anyone help out with my script? 谁能帮忙我的脚本？

#activate iris datset
library(boot)
library(datasets)

#create function to retrieve p-value
boot.fn <- function(data, sample) {
           x <- iris$Petal.Length[i]
           y <- iris$Sepal.Length[i]
           boot.p <- cor.test(iris$Petal.Length[i],
                              iris$Sepal.Length[i])$p.value
           }

#create 1000 samples with bootstrap function
bootstr <- boot(iris, boot.fn, 1000)

Answer 1

the function boot will not provide the desired behavior. 功能boot将无法提供所需的行为。 However it is quite simple to implement it yourself: 但是，自己实现它非常简单：

First some data: 首先一些数据：

x1 <- rnorm(1e5)
y1 <- x1 + rnorm(1e5, 0.5)

cor.test(x1, y1)
#output
    Pearson's product-moment correlation

data:  x1 and y1
t = 315.97, df = 99998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7037121 0.7099151
sample estimates:
      cor 
0.7068272

sample 250 indexes 1000 times: 采样250个索引1000次：

#set.seed(1)
z1 <- replicate(1000, sample(1:length(x1), 250, replace = T))

if without replacement is needed just remove that part 如果不需要更换，只需移除该零件

now go over the columns, use the indexes to subset x1 and y1 , calculate the statistic and use the unlisted list to plot a histogram. 现在遍历各列，使用索引对x1和y1进行子集计算，统计并使用未列出的列表绘制直方图。

hist(unlist(apply(z1, 2, function(x){
  cor.test(x1[x], y1[x])$p.value
})), xlab = "p value", main = "Uh)

perhaps more informative is: 也许更有用的是：

hist(unlist(apply(z1, 2, function(x){
  cor.test(x1[x], y1[x])$estimate
})), xlab = "cor", main ="Uh")

相关系数的自举p值（重采样方法）

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-04-02 11:04:31

相关系数的自举p值（重采样方法）

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-04-02 11:04:31

解决方案1
3 已采纳 2018-04-02 11:04:31