简体   繁体   English

使用python查找比例采样

[英]Find propotional sampling using python

I'm given a problem that explicitly asks me not to use numpy and pandas我遇到了一个明确要求我不要使用 numpy 和 pandas 的问题

Prob : Selecting an element from the list A randomly with probability proportional to its magnitude.概率:从列表 A 中随机选择一个元素,概率与其大小成正比。 assume we are doing the same experiment for 100 times with replacement, in each experiment you will print a number that is selected randomly from A.假设我们用替换进行了 100 次相同的实验,在每个实验中,您将打印一个从 A 中随机选择的数字。

Ex 1: A = [0 5 27 6 13 28 100 45 10 79]
let f(x) denote the number of times x getting selected in 100 experiments.
f(100) > f(79) > f(45) > f(28) > f(27) > f(13) > f(10) > f(6) > f(5) > f(0)

Initially, I took the sum of all the elements of list A最初,我取了列表 A 中所有元素的总和

I then divided (in order to normaliz) each element of list A by the sum and stored each of these values in another list (d_dash)然后我将列表 A 的每个元素除以(为了规范化)总和,并将这些值中的每一个存储在另一个列表中 (d_dash)

I then created another empty list (d_bar), that takes in cumalative sum of all elements of d_dash然后我创建了另一个空列表(d_bar),它接收 d_dash 所有元素的累积总和

created variable r, where r= random.uniform(0.0,1.0), and then for the length of d_dash comapring r to d_dash[k], if r<=d_dash[k], return A[k]创建变量 r,其中 r= random.uniform(0.0,1.0),然后对于 d_dash 的长度,将 r 映射到 d_dash[k],如果 r<=d_dash[k],则返回 A[k]

However, I'm getting the error list index out of range near d_dash[j].append((A[j]/sum)), not sure what is the issue here as I did not exceed the index of either d_dash or A[j].但是,我得到的错误list index out of range d_dash[j].append((A[j]/sum)) 附近list index out of range ,不确定这里有什么问题,因为我没有超过 d_dash 或 A 的索引[j]。

Also, is my logic correct ?另外,我的逻辑正确吗? sharing a better way to do this would be appreciated.分享一个更好的方法来做到这一点将不胜感激。

Thanks in advance.提前致谢。

import random

A = [0,5,27,6,13,28,100,45,10,79]

def propotional_sampling(A):
    sum=0
    for i in range(len(A)):
        sum = sum + A[i]

    d_dash=[]

    for j in range(len(A)):
        d_dash[j].append((A[j]/sum))

    #cumulative sum

    d_bar =[]
    d_bar[0]= 0

    for k in range(len(A)):
        d_bar[k] = d_bar[k] + d_dash[k]

    r = random.uniform(0.0,1.0)
    number=0

    for p in range(len(d_bar)):
        if(r<=d_bar[p]):
            number=d_bar[p]
    return number

def sampling_based_on_magnitued():
    for i in range(1,100):
        number = propotional_sampling(A)
        print(number)

sampling_based_on_magnitued()

Below is the code to do the same :以下是执行相同操作的代码:

A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]

#Sum of all the elements in the array
S = sum(A)

#Calculating normalized sum
norm_sum = [ele/S for ele in A]

#Calculating cumulative normalized sum
cum_norm_sum = []
cum_norm_sum.append(norm_sum[0])
for itr in range(1, len(norm_sum), 1) :
   cum_norm_sum.append(cum_norm_sum[-1] + norm_sum[itr])

def prop_sampling(cum_norm_sum) :
    """
    This function returns an element
    with proportional sampling.
    """
    r = random.random()
    for itr in range(len(cum_norm_sum)) :
       if r <  cum_norm_sum[itr] :
           return A[itr]

#Sampling 1000 elements from the given list with proportional sampling
sampled_elements = []
for itr in range(1000) :
   sampled_elements.append(prop_sampling(cum_norm_sum))

Below image shows the frequency of each element in the sampled points :下图显示了采样点中每个元素的频率:

在此处输入图片说明

Clearly the number of times each elements appears is proportional to its magnitude.显然,每个元素出现的次数与其大小成正比。

Cumulative sum can be computed by itertools.accumulate .累积和可以通过itertools.accumulate计算。 The loop:循环:

for p in range(len(d_bar)):
    if(r<=d_bar[p]):
        number=d_bar[p]

can be substituted by bisect.bisect() ( doc ):可以用bisect.bisect() ( doc ) 代替:

import random
from itertools import accumulate
from bisect import bisect

A = [0,5,27,6,13,28,100,45,10,79]

def propotional_sampling(A, n=100):
    # calculate cumulative sum from A:
    cum_sum = [*accumulate(A)]
    # cum_sum = [0, 5, 32, 38, 51, 79, 179, 224, 234, 313]

    out = []
    for _ in range(n):
        i = random.random()                     # i = [0.0, 1.0)
        idx = bisect(cum_sum, i*cum_sum[-1])    # get index to list A
        out.append(A[idx])

    return out

print(propotional_sampling(A))

Prints (for example):打印(例如):

[10, 100, 100, 79, 28, 45, 45, 27, 79, 79, 79, 79, 100, 27, 100, 100, 100, 13, 45, 100, 5, 100, 45, 79, 100, 28, 79, 79, 6, 45, 27, 28, 27, 79, 100, 79, 79, 28, 100, 79, 45, 100, 10, 28, 28, 13, 79, 79, 79, 79, 28, 45, 45, 100, 28, 27, 79, 27, 45, 79, 45, 100, 28, 100, 100, 5, 100, 79, 28, 79, 13, 100, 100, 79, 28, 100, 79, 13, 27, 100, 28, 10, 27, 28, 100, 45, 79, 100, 100, 100, 28, 79, 100, 45, 28, 79, 79, 5, 45, 28]

The reason you got "list index out of range" message is that you created an empty list "d_bar =[]" and the started assigning value to it "d_bar[k] = d_bar[k] + d_dash[k]".您收到“列表索引超出范围”消息的原因是您创建了一个空列表“d_bar =[]”,并开始为其赋值“d_bar[k] = d_bar[k] + d_dash[k]”。 I recoomment using the followoing structor isntead: First, define it in this way:我建议使用以下结构体 istead:首先,以这种方式定义它:

d_bar=[0 for i in range(len(A))] d_bar=[0 for i in range(len(A))]

Also, I believe this code will return 1 forever as there is no break in the loop.此外,我相信这段代码将永远返回 1,因为循环中没有中断。 you can resolve this issue by adding "break".您可以通过添加“中断”来解决此问题。 here is updated version of your code:这是您的代码的更新版本:

A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]

def pick_a_number_from_list(A):
    sum=0
    for i in A:
        sum+=i
    A_norm=[]
    for j in A:
        A_norm.append(j/sum)
    A_cum=[0 for i in range(len(A))]
    A_cum[0]=A_norm[0]
    for k in range(len(A_norm)-1):
        A_cum[k+1]=A_cum[k]+A_norm[k+1]
    A_cum

    r = random.uniform(0.0,1.0)
    number=0

    for p in range(len(A_cum)):
            if(r<=A_cum[p]):
                number=A[p]
                break
    return number

def sampling_based_on_magnitued():
    for i in range(1,100):
        number = pick_a_number_from_list(A)
        print(number)

sampling_based_on_magnitued()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM