以概率添加元素到数组

Question

So I am building a list in Python, for example, let us say the first 100 integers, but I do need all the 100 integers but only a sample lets say 3. 所以我在Python中构建一个列表，例如，让我们说前100个整数，但我确实需要所有100个整数，但只有一个样本可以说3。

import random 

def f():
    list_ = []
    for i in range(100):
        list_.append(i)
    return list_

def g(list_,k):
     return random.sample(list_, k)

print(g(f(),3))

>>>[50, 92, 6]

Now can I get away with not building the whole list in the first place, but directly build the sample, maybe by adding a probability with which elements get added to the list in f() 现在我可以逃避不首先构建整个列表，但直接构建样本，可能通过添加一个概率，在f()中将元素添加到列表中

Because if I am building a huge list which does not integers numbers but some other objects, this approach could be costly, in terms of memory and computation. 因为如果我正在构建一个庞大的列表，它不是整数而是其他一些对象，那么就内存和计算而言，这种方法可能成本很高。

Answer 1

def random_no_dups_k_of_n(k, n):
    res = list(range(k))
    for i in range(k, n):
        v = random.randint(0, i) # this is 0-i inclusive
        if v == i:
            ir = random.randint(0,k-1)
            res[ir] = i
    return res

What's happening here: it's a telescoping product. 这里发生了什么：它是一种伸缩产品。 Each element from 0 to k-1 starts out having a k/k chance of being selected. 从0到k-1每个元素开始具有被选择的k/k机会。 After 1st iteration k has 1/(k+1) chance of getting selected, while all others (not just remaining, but all) have a (k-1)/k * k/(k+1) = (k-1)/(k+1) chance of getting selected. 在第一次迭代之后， k具有1/(k+1)被选中的机会，而所有其他（不仅仅是剩余，但是全部）具有(k-1)/k * k/(k+1) = (k-1)/(k+1)被选中的机会。 After 2nd iteration, k+1 has a 1/(k+2) chance of getting selected, while all the others have a (k-1)/(k+1) * (k+1)/(k+2) = (k-1)/(k+2) chance of getting selected. 在第二次迭代之后， k+1具有1/(k+2)被选中的机会，而所有其他的具有(k-1)/(k+1) * (k+1)/(k+2) = (k-1)/(k+2)被选中的机会。 And so on. 等等。 In the end, each number will have a k/n chance of getting selected. 最后，每个数字都有一个k/n被选中的机会。

Actually, I just saw that you can just do random.sample(range(n), k) . 实际上，我刚看到你可以做random.sample(range(n), k) 。 I just assumed it wasn't available. 我只是假设它不可用。

EDIT : I got the probabilities reversed above. 编辑：我得到了上面颠倒的概率。 The correct version should be: 正确的版本应该是：

def random_no_dups_k_of_n(k, n):
    res = list(range(k))
    for i in range(k, n):
        v = random.randint(0, i) # this is 0-i inclusive
        if v < k:
            ir = random.randint(0,k-1)
            res[ir] = i
    return res

Each element from 0 to k-1 starts out having a k/k chance of being selected. 从0到k-1每个元素开始具有被选择的k/k机会。 After 1st iteration k has k/(k+1) chance of getting selected, while all others (not just remaining, but all) have a k/k*((k-1)/k * k/(k+1) + 1(k+1) = k/(k+1) chance of getting selected. After 2nd iteration, k+1 has a k/(k+2) chance of getting selected, while all the others have a k/(k+1)*((k-1)/k * k/(k+2) + 2/(k+2))= k/(k+2) chance of getting selected. 在第一次迭代之后， k具有被选择的k/(k+1)机会，而所有其他（不仅仅是剩余，但是全部）具有k/k*((k-1)/k * k/(k+1) + 1(k+1) = k/(k+1)被选中的几率。在第二次迭代之后， k+1有k/(k+2)被选中的机会，而所有其他有k/(k+1)*((k-1)/k * k/(k+2) + 2/(k+2))= k/(k+2)被选中的机会。

And this actually does collapse all the calculations to give each element a k/(k+m) chance after m th step. 这实际上会使所有计算崩溃，以便在第m步之后给每个元素一个k/(k+m)机会。

以概率添加元素到数组

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-04-13 02:09:43

以概率添加元素到数组

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-04-13 02:09:43

解决方案1
3 已采纳 2017-04-13 02:09:43