简体繁体 English

颗粒过滤器中有替换采样与无替换采样之间的差异

[英]Difference between sampling with replacement vs without replacement in particle filter

原文 2015-07-13 14:11:30 0 2 statistics/ random-sample/ particle-filter

For the re-sampling process of a simple particle filter, what is the difference between sampling with replacement vs sampling without replacement in terms of statistical biases and practical implications? 对于简单粒子过滤器的重新采样过程，从统计偏差和实际意义上讲，有替换采样与无替换采样之间有什么区别？

I believe the without replacement re-sampling method I have in mind is not the same as the usual statistic method of sampling without replacement. 我相信我想到的“ 无替换”重新采样方法与“ 无替换”的常规统计方法不同。

In a more concrete context: 在更具体的情况下：

After the simulate and observe processes of particle filter, I end up with a list of two-element tuples (s, p) , with length N . 在模拟和观察粒子过滤器的过程之后，我得到了一个长度为N的两元素元组（s，p）的列表。 Whereas, s represent a state which I believe in with a probability of p . s代表我相信的状态，概率为p 。

Sampling with replacement would be: 与更换采样将是：
1.Calculate cumulative sum of p for each tuple over the list. 1.为列表中的每个元组计算p的累积和。
2.Draw random numbers from [0, 1) and see which bucket on the cumulative sum each random number falls into, the element corresponding to that bucket is replicated as a new particle for the next round. 2.从[0，1）中提取随机数，然后查看每个随机数落入累积和中的哪个存储桶，将与该存储桶对应的元素复制为下一轮的新粒子。

This is with replacement because each random number is independent of another, every old particle has a chance equal to p to be chosen, regardless of how many new particles has already be generated. 这是因为更换每个随机数是独立的另一个，每个旧颗粒具有不管有多少新的粒子已经被产生的机会等于p来进行选择，。

Sampling without replacement would be: 无需更换的采样将是：
1.Calculate cumulative sum of p for each tuple over the list. 1.为列表中的每个元组计算p的累积和。
2.Generate a list of floating numbers in an arithmetic sequence, where the i-th element equals i * (1 / N) . 2.以算术序列生成一个浮点数列表，其中第i个元素等于i *（1 / N） 。 Use this as the random numbers to plug into the cumulative sum buckets. 将其用作随机数以插入累积和桶。 You can imagine it as slicing the cumulative sum p list with a railing that has equal distance bars. 您可以想象它是用具有相等距离条的栏杆对累积和p列表进行切片。 Again, each bucket's corresponding element is replicated to become a new particle. 同样，每个存储桶的对应元素被复制以成为新粒子。

This is without replacement because once a bucket's chosen arithmetic sequence is used up, it would never be chosen again. 这是不用更换，因为一旦一个桶的选择等差数列用完，它就不会被再次选择。

Practical example: 实际示例：

N = 8 N = 8

( s , p ) list: (A, 0.1), (B, 0.2), (C, 0.3), (D, 0.4) （ s ， p ）列表：（A，0.1），（B，0.2），（C，0.3），（D，0.4）

with replacement, assume random numbers are: 0.2, 0.8, 0.4, 0.7, 0.6, 0.3, 0.9, 0.1, new list particle becomes B, D, C, D, C, B, D, A 与更换，假定随机数：0.2，0.8，0.4，0.7，0.6，0.3，0.9，0.1，新的列表粒子变为B，d，C，d，C，B，d，A

without replacement, the arithmetic sequence is: 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 0.99999999, new particle list becomes B, B, C, C, D, D, D, D 不替换的算术序列为：0.125、0.25、0.375、0.5、0.625、0.75、0.875、0.99999999，新粒子列表变为B，B，C，C，D，D，D，D

2 个解决方案

Mehrdad is right; 迈尔达德是对的。 there is a bug in your sampling method. 您的抽样方法中有一个错误。 There are ways to fix this bug too (such as redoing the process after removing each sample), but conceptually sampling without replacement in a particle filter is just a bad idea. 也有一些方法可以修复此错误（例如，删除每个样本后重做该过程），但是从概念上讲，在粒子过滤器中进行替换而不进行采样只是一个坏主意。

The goal during the sampling step is to draw samples from the true state probability distribution at a particular timestep. 采样步骤中的目标是从特定时间步的真实状态概率分布中抽取样本。 Because we are approximating this distribution with a finite number of particles, sampling with replacement essentially modifies the distribution after each sample, so that the distribution that the final sample is drawn from differs from the distribution that the first sample was drawn from. 因为我们用有限数量的粒子来近似此分布，所以替换采样实质上会修改每个样本之后的分布，因此最终样本所来自的分布与第一个样本所源自的分布不同。

More concretely, consider a hypothetical situation in which you have two particles in states A and B with masses 0.01 and 0.99, respectively. 更具体地讲，考虑一种假设情况，其中在状态A和状态B中有两个质量分别为0.01和0.99的粒子。 If we take two samples with replacement, we will most likely (0.98 probability) get two particles in state B. However, if we take two samples without replacement, we will always get one particle in each. 如果我们获取两个样本进行替换，则最有可能（概率为0.98）得到状态B的两个粒子。但是，如果我们获取两个样本而不进行替换，则每个样本总是得到一个粒子。 This tosses out a lot of the information in the original particle distribution and replaces it with essentially a uniform distribution. 这会在原始粒子分布中丢弃大量信息，并用基本均匀的分布替换它。

That particular example is contrived, but consider that typically in a particle filter, the number of particles is constant. 该特定示例是人为设计的，但是考虑到通常在粒子过滤器中，粒子数是恒定的。 That is, you sample, re-weight particles, then re-sample the same number again. 也就是说，您对粒子进行采样，重新加权，然后再次对相同的数字重新采样。 In this setting, resampling without replacement simply reproduces the original set of particles (since you will sample every particle), thereby completing ignoring the effect of the re-weighting step! 在这种设置下，无需替换即可重采样仅会复制原始粒子集（因为您将对每个粒子进行采样），从而完全忽略了重新加权步骤的效果！

I believe the "without resampling" method you have described is wrong, since it guarantees that if the first element has a smaller likelihood than 1 / N , then it will not get chosen and hence those states will get automatically rejected by the algorithm. 我相信您所描述的“不重采样”方法是错误的，因为它保证了如果第一个元素的可能性小于1 / N ，那么它将不会被选择，因此这些状态将被算法自动拒绝。

Contrast the first element with the middle element, which can still get chosen even if its likelihood is less than 1 / N . 将第一个元素与中间元素进行比较，即使它的可能性小于1 / N ，仍然可以选择中间元素。 This means the algorithm is biased against the first element toward the middle. 这意味着算法相对于第一个元素偏向中间。

This is not something you want in a resampling step; 这不是重采样步骤中想要的。 everything should have a fair nonzero chance to propagate. 一切都应该有公平的非零传播机会。 Otherwise, you lose the probabilistic correctness guarantees. 否则，您将丢失概率正确性保证。