根据 null 假设测试 80,000 多个模拟正态分布观察集

Question

I need to generate a random sample of size 200 (n=200) from a normal distribution with variance 1 and true mu (average) I specify;我需要从方差为 1 和我指定的真实 mu（平均值）的正态分布中生成大小为 200（n=200）的随机样本； then, I test the draw against a hypothesis: mu <= 1. I need to do this for each of 400 potential true thetas, and for each true theta I need to replicate this 200 times.然后，我根据一个假设测试平局：mu <= 1。我需要为 400 个潜在的真实 theta 中的每一个执行此操作，并且对于每个真实的 theta，我需要复制 200 次。

I already did this for n=1, but I realize my approach is not replicable.我已经为 n=1 做了这个，但我意识到我的方法是不可复制的。 For each 400 thetas, I ran the following:对于每 400 个 theta，我运行以下命令：

sample_r200n1_t2=normal(loc=-0.99, scale=1, size=200)
sample_r200n1_t3=normal(loc=-0.98, scale=1, size=200)
sample_r200n1_t4=normal(loc=-0.97, scale=1, size=200)
sample_r200n1_t5=normal(loc=-0.96, scale=1, size=200)
... on and on to loc = 3

Then, I tested each element in the generated array separately.然后，我分别测试了生成数组中的每个元素。 However, that approach would require me to generate tens of thousands of samples, I generate the mean associated with each, then test that mean against my criteria.然而，这种方法需要我生成数以万计的样本，我生成与每个样本相关的平均值，然后根据我的标准测试该平均值。 This would have to be done 80,000 times (and, on top of this I need to do this for multiple different sizes n).这必须完成 80,000 次（除此之外，我还需要针对多个不同大小的 n 执行此操作）。 Clearly - this is not the approach to take.显然 - 这不是采取的方法。

How can I achieve the results I am looking for?我怎样才能达到我想要的结果？ Is there a way, for example, to generate an array of sample means and put those means into an array, one per theta?例如，有没有办法生成一组样本均值并将这些均值放入一个数组中，每个 theta 一个？ Then I could test as before.然后我可以像以前一样测试。 Or, is there another way?或者，还有其他方法吗？

Answer 1

You can generate all 200 200 400 = 16 million random values in a numpy array (which consumes ~122 megabytes of storage (check with draws.nbytes/1024/1024 ), and use scipy to run a one-sided, one-sample t-test on each of the 200 samples of 200 observations for each value of theta:您可以在 numpy 数组中生成所有 200 200 400 = 1600 万个随机值（这会消耗约 122 兆字节的存储空间（请查看draws.nbytes/1024/1024 ），并使用 scipy 运行单面样本- 对每个 theta 值的 200 个观测值的 200 个样本中的每一个进行测试：

from numpy.random import normal
from scipy.stats import ttest_1samp
import matplotlib.pyplot as plt

# Array of loc values; for each loc, we draw 200 
# samples of 200 normally distributed observations
locs = np.linspace(-1, 3, 401)

# Array of shape (401, 200, 200) = (locs, samples, observations)
# Note that 200 draws of 200 i.i.d. observations is the same as
# 1 draw of 200*200 i.i.d. observations, reshaped to (200, 200)
draws = np.array([normal(loc=x, scale=1, size=200*200)
                  for x in locs]).reshape(401, 200, 200)

# axis=1 computes t-test across columns.
# Alternative hypothesis that sample mean
# is less than the population mean of 1 implies a null
# hypothesis that sample mean is greater than or equal to
# the population mean
tstats, pvals = ttest_1samp(draws, 1, alternative='less', axis=1)

# Count how many out of 200 t-tests reject the null hypothesis
# at the alpha=0.05 level
rejects = (pvals < 0.05).sum(axis=1)

# Visual check: p-values should be low for sample means
# far below 1, as these tests should reject the null 
# hypothesis that sample mean >= 1
plt.plot(locs, rejects)
plt.axvline(1, c='r')
plt.title('Number of t-tests rejecting $H_0 : \mu \geq 1$ with $p < 0.05$')
plt.xlabel('Known sample mean $\\theta$')

根据 null 假设测试 80,000 多个模拟正态分布观察集

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-12-19 00:47:26

根据 null 假设测试 80,000 多个模拟正态分布观察集

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-12-19 00:47:26

解决方案1
0 已采纳 2021-12-19 00:47:26