[英]Drawing from truncated normal distribution delivers wrong standard deviation in R
I draw random numbers from a truncated normal distribution. 我从截断的正态分布中得出随机数。 The truncated normal distribution is supposed to have mean 100 and standard deviation 60 after truncation at 0 from the left.
截断后的正态分布在从左0开始截断后应该具有均值100和标准差60。 I computed an algorithm to compute the mean and sd of the normal distribution prior to the truncation (mean_old and sd_old).
我计算了一种算法来计算截断前的正态分布的均值和sd(mean_old和sd_old)。 The function vtruncnorm gives me the (wanted) variance of 60^2.
函数vtruncnorm给了我(想要的)方差60 ^ 2。 However, when I draw random variables from the distribution, the standard deviation is around 96. I don't understand why the sd of the random variables varies from the computation of 60.
但是,当我从分布中得出随机变量时,标准偏差大约为96。我不理解为什么随机变量的sd不同于60的计算。
I tried increasing the amount of draws - still results in sd around 96. 我尝试增加抽奖的数量-仍然导致SD保持在96附近。
require(truncnorm)
mean_old = -5425.078
sd_old = 745.7254
val = rtruncnorm(10000, a=0, mean = mean_old, sd = sd_old)
sd(val)
sqrt(vtruncnorm( a=0, mean = mean_old, sd = sd_old))
Ok, I did quick test 好,我做了快速测试
require(truncnorm)
val = rtruncnorm(1000000, a=7.2, mean = 0.0, sd = 1.0)
sd(val)
sqrt(vtruncnorm( a=7.2, mean = 0.0, sd = 1.0))
Canonical truncated gaussian. 典范截断高斯。 At a=6 they are very close, 0.1554233 vs 0.1548865 fe, depending on seed etc. At a = 7 they are systematically different, 0.1358143 vs 0.1428084 (sampled value is smaller that function call).
在a = 6时,它们非常接近,即0.1554233 vs.0.1548865 fe,取决于种子等。在a = 7时,它们在系统上是不同的,即0.1358143 vs.0.1428084(采样值小于该函数调用的值)。 I've checked with Python implementation
我已经检查过Python的实现
import numpy as np
from scipy.stats import truncnorm
a, b = 7.0, 100.0
mean, var, skew, kurt = truncnorm.stats(a, b, moments='mvsk')
print(np.sqrt(var))
r = truncnorm.rvs(a, b, size=100000)
print(np.sqrt(np.var(r)))
and got back 0.1428083662823426 which is consistent with R vtruncnorm result. 并返回0.1428083662823426,这与R vtruncnorm结果一致。 At your a=7.2 or so results are even worse.
在a = 7.2左右时,结果甚至更糟。
Moral of the story - at high a
values sampling from rtruncnorm has a bug. 这个故事的寓意-从rtruncnorm采样的
a
高值有一个错误。 Python has the same problem as well. Python也有同样的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.