简体繁体 English

采样方式

[英]Sampling methods

原文 2013-02-19 09:08:50 5 1 python/ sampling

Can you help me out with these questions? 您能帮我解决这些问题吗？ I'm using Python 我正在使用Python

Sampling Methods 抽样方法

Sampling (or Monte Carlo) methods form a general and useful set of techniques that use random numbers to extract information about (multivariate) distributions and functions. 采样（或蒙特卡洛）方法形成了一套通用且有用的技术，这些技术使用随机数来提取有关（多变量）分布和函数的信息。 In the context of statistical machine learning, we are most often concerned with drawing samples from distributions to obtain estimates of summary statistics such as the mean value of the distribution in question. 在统计机器学习的上下文中，我们最经常关注的是从分布中抽取样本以获取摘要统计的估计值，例如所讨论分布的平均值。

When we have access to a uniform (pseudo) random number generator on the unit interval (rand in Matlab or runif in R) then we can use the transformation sampling method described in Bishop Sec. 当我们可以在单位间隔（Matlab中的rand或R中的runif）上访问统一（伪）随机数生成器时，可以使用Bishop Sec中描述的变换采样方法。 11.1.1 to draw samples from more complex distributions. 11.1.1从更复杂的分布中抽取样本。 Implement the transformation method for the exponential distribution 实现指数分布的转换方法

$$p(y) = \\lambda \\exp(−\\lambda y) , y \\geq 0$$ $$ p（y）= \\ lambda \\ exp（− \\ lambda y），y \\ geq 0 $$

using the expressions given at the bottom of page 526 in Bishop: Slice sampling involves augmenting z with an additional variable u and then drawing samples from the joint (z,u) space . 使用Bishop中第526页底部给出的表达式： 切片采样包括用一个附加变量u扩展z，然后从联合（z，u）空间中抽取样本 。

The crucial point of sampling methods is how many samples are needed to obtain a reliable estimate of the quantity of interest. 采样方法的关键点是需要多少个样本才能获得感兴趣量的可靠估计。 Let us say we are interested in estimating the mean, which is 假设我们有兴趣估算均值，即

$$\\mu_y = 1/\\lambda$$ $$ \\ mu_y = 1 / \\ lambda $$

in the above distribution, we then use the sample mean 在上面的分布中，我们然后使用样本均值

$$b_y = \\frac1L \\sum^L_{\\ell=1} y(\\ell)$$ $$ b_y = \\ frac1L \\ sum ^ L _ {\\ ell = 1} y（\\ ell）$$

of the L samples as our estimator. L个样本中的一个作为我们的估计量。 Since we can generate as many samples of size L as we want, we can investigate how this estimate on average converges to the true mean. 由于我们可以根据需要生成任意多个大小为L的样本，因此我们可以研究此估计值平均如何收敛到真实均值。 To do this properly we need to take the absolute diﬀerence 为了正确地做到这一点，我们需要采取绝对的区别

$$|\\mu_y − b_y|$$ $$ | \\ mu_y − b_y | $$

between the true mean $µ_y$ and estimate $b_y$ averaged over many, say 1000, repetitions for several values of $L$, say 10, 100, 1000. Plot the expected absolute deviation as a function of $L$. 真实平均值$ µ_y $和估计值$ b_y $之间的平均值为多个L $（例如10、100、1000）值的多次重复（例如1000）的平均值。绘制预期的绝对偏差作为$ L $的函数。 Can you plot some transformed value of expected absolute deviation to get a more or less straight line and what does this mean? 您可以绘制一些预期的绝对偏差的转换值以获得或多或少的直线，这是什么意思？

I'm new to this kind of statistical machine learning and really don't know how to implement it in Python. 我是这种统计机器学习的新手，真的不知道如何在Python中实现它。 Can you help me out? 你能帮我吗？

1 个解决方案

There are a few shortcuts you can take. 您可以采取一些捷径。 Python has some built-in methods to do sampling, mainly in the Scipy library. Python有一些内置的方法来进行采样，主要是在Scipy库中。 I can recommend a manuscript that implements this idea in Python (disclaimer: I am the author), located here . 我可以推荐一个使用Python实现此想法的手稿（免责声明：我是作者），位于此处。

It is part of a larger book, but this isolated chapter deals with the more general Law of Large Numbers + convergence, which is what you are describing. 它是一本较大的书的一部分，但是本孤立的章节介绍的是更一般的大数定律+收敛。 The paper deals with Poisson random variables, but you should be able to adapt the code to your own situation. 本文讨论的是Poisson随机变量，但是您应该能够使代码适合您自己的情况。