简体   繁体   English

当 n 增加时用求和证明中心极限定理

[英]Prove the central limit theorem with a sum when n increases

I'm trying to use Monte Carlo simulation in order to show how the sum of an uniform sample is normal distributed when the dimension of the sample increase.我正在尝试使用蒙特卡罗模拟来展示当样本维数增加时均匀样本的总和如何呈正态分布。

More precisely: let define $X ~ U[2,3]$ where $X_1,...,X_n$ is an iid sample from X and $S = \sum_{1}^{n}(X_i).更准确地说:定义 $X ~ U[2,3]$ 其中 $X_1,...,X_n$ 是来自 X 和 $S = \sum_{1}^{n}(X_i) 的独立同分布样本。 I want use Monte Carlo Simulation in order to show that the distribution of S is approximately normal when n is large (as predicted by Central Limit Theorem).我想使用蒙特卡洛模拟来证明当 n 很大时 S 的分布近似正态(如中心极限定理预测的那样)。

What I want to show is that when the number of observation in S rise its distribution is more normal.我想表明的是,当 S 中的观测值数量增加时,其分布更加正常。 Is also important that I'm talking about the sum of $X_i$, so I'm not considering the general case with the mean.同样重要的是,我谈论的是 $X_i$ 的总和,所以我没有考虑均值的一般情况。

The problem is that I can obtain a more (or less) normal distribution when I increase (or decrease) the number of time in the Monte Carlo.问题是当我增加(或减少)蒙特卡罗的次数时,我可以获得更多(或更少)的正态分布。 instead, If I change the sample dimension the differences are VERY low, I can see a normal distribution even when the sample is 10 and, for example, from 10 to 100 i can't notice any significant difference.相反,如果我更改样本维度,差异非常小,即使样本为 10,我也可以看到正态分布,例如,从 10 到 100,我看不到任何显着差异。

Here there is my MWE:这是我的 MWE:

    #create random variable with sample size of 1000 that is uniformally distributed
    data <- runif(n=10000, min=2, max=3)
    hist(data, col='steelblue', main='Histogram from the Uniform')

    #I take, for 1000 times, the sum of a sample=10 from X
    sample10 <- c()
    n = 1000
    for (i in 1:n){
     sample10[i] = sum(sample(data, 10, replace=TRUE))
    }
    hist(sample10, col ='steelblue', main='Sample size = 10', prob=TRUE)
    qqnorm(sample10); qqline(sample10)


    #Increasing the sample dimension
    sample100 <- c()
    n = 1000
    for (i in 1:n){
     sample100[i] = sum(sample(data, 100, replace=TRUE))
    }
    hist(sample100, col ='steelblue', main='Sample size = 100', prob=TRUE)
    qqnorm(sample100); qqline(sample100)

What am I doing wrong?我究竟做错了什么?

PS.附言。 Sorry for my English, any request for clarification is welcome.对不起我的英语,欢迎任何澄清请求。

Here is a simulation of the sums of n random uniforms U(2, 3) with n varying from 1 to 11 by steps of 2. Each sum is replicated 1000 times.这是n随机制服 U(2, 3) 的总和的模拟,其中n从 1 到 11,步长为 2。每个总和重复 1000 次。

set.seed(2022)

nvec <- seq(1, 12, by = 2)
R <- 1e3
S_list <- lapply(nvec, \(n) {
  replicate(R, sum(runif(n, 2, 3)))
})

Created on 2022-12-01 with reprex v2.0.2创建于 2022-12-01,使用reprex v2.0.2

Now the histograms.现在是直方图。 You will see that convergence is very quick.你会发现收敛速度非常快。 That feature is even the basis of a CLT-based pseudo-RNG algorithm for the standard normal .该特征甚至是标准正态的基于 CLT 的伪 RNG 算法的基础。

old_par <- par(mfrow = c(2, 3))
mapply(\(S, n) {
  main <- sprintf("S with n = %d", n)
  hist(S, main = main, freq = FALSE)
  invisible(NULL)
}, S_list, nvec)

#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL
#> 
#> [[5]]
#> NULL
#> 
#> [[6]]
#> NULL
par(old_par)

Created on 2022-12-01 with reprex v2.0.2创建于 2022-12-01,使用reprex v2.0.2

Don't worry about these NULL 's, they are the return value of mapply .不用担心这些NULL的,它们是mapply的返回值。

And the QQ-plots.和QQ重复。

old_par <- par(mfrow = c(2, 3))
mapply(\(S, n) {
  main <- sprintf("S with n = %d", n)
  qqnorm(S, main = main)
  qqline(S)
}, S_list, nvec)

#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL
#> 
#> [[5]]
#> NULL
#> 
#> [[6]]
#> NULL
par(old_par)

Created on 2022-12-01 with reprex v2.0.2创建于 2022-12-01,使用reprex v2.0.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM