[英]calculating mean and variance using monte carlo methods, having Density Kernel
We can use numerical methods here.我们可以在这里使用数值方法。 First of all, we create a function to represent your probability density function (though this is not yet scaled so that its integral is 1):首先,我们创建了一个函数来表示您的概率密度函数(尽管它还没有进行缩放,因此其积分为 1):
pdf <- function(x) x^2 * exp(-x^2/4)
plot(pdf, xlim = c(0, 10))
We can see that almost all of the area under the curve occurs where x < 10, so if we find the integral at, say, x = 100, we should have a very accurate scaling factor to generate a true pdf:我们可以看到曲线下几乎所有的面积都发生在 x < 10 处,所以如果我们在 x = 100 处找到积分,我们应该有一个非常准确的比例因子来生成真正的 pdf:
integrate(pdf, 0, 100)$value
#> [1] 3.544908
So now we can generate a genuine pdf:所以现在我们可以生成一个真正的pdf:
pdf <- function(x) x^2 * exp(-x^2/4) / 3.544908
plot(pdf, xlim = c(0, 10))
Now that we have a pdf, we can create a cdf with numerical integration:现在我们有了一个 pdf,我们可以创建一个带有数值积分的 cdf:
cdf <- function(x) sapply(x, \(i) integrate(pdf, 0, i)$value)
plot(cdf, xlim = c(0, 10))
The inverse of the cdf is what we need to be able to convert a sample taken from a uniform distribution between 0 and 1 into a sample drawn from our new distribution. cdf 的倒数是我们需要能够将从 0 和 1 之间的均匀分布中抽取的样本转换为从我们的新分布中抽取的样本。 We can create an inverse function using uniroot
to find where the output of our cdf matches an arbitrary number between 0 and 1:我们可以使用uniroot
创建一个反函数来查找 cdf 的输出与 0 和 1 之间的任意数字匹配的位置:
inverse_cdf <- function(p)
{
sapply(p, function(i) {
uniroot(function(a) {cdf(a) - i}, c(0, 100))$root
})
}
The inverse cdf looks like this:逆 cdf 如下所示:
plot(inverse_cdf, xlim = c(0, 0.99))
We are now ready to draw a sample from our distribution:我们现在准备从我们的分布中抽取样本:
set.seed(1) # Makes this draw reproducible
x_sample <- inverse_cdf(runif(1000))
Now we can plot a histogram of our sample and ensure it matches the pdf:现在我们可以绘制样本的直方图并确保它与 pdf 匹配:
hist(x_sample, freq = FALSE)
plot(function(x) pdf(x), add = TRUE, xlim = c(0, 6))
Now we are confident that we have a sample drawn from x, we can use the sample mean and standard deviation as estimates for the distribution's mean and standard deviation:现在我们确信我们有一个从 x 中抽取的样本,我们可以使用样本均值和标准差作为分布均值和标准差的估计值:
mean(x_sample)
#> [1] 2.264438
sd(x_sample)
#> [1] 0.9625839
We can increase the accuracy of these estimates by taking larger and larger samples in our call to inverse_cdf(runif(1000))
, by increasing the 1000 to a larger number.通过将 1000 增加到更大的数字,我们可以通过在调用inverse_cdf(runif(1000))
时采用越来越大的样本来提高这些估计的准确性。
Created on 2021-11-06 by the reprex package (v2.0.0)由reprex 包( v2.0.0 ) 于 2021 年 11 月 6 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.