[英]Find propotional sampling using python
I'm given a problem that explicitly asks me not to use numpy and pandas我遇到了一个明确要求我不要使用 numpy 和 pandas 的问题
Prob : Selecting an element from the list A randomly with probability proportional to its magnitude.概率:从列表 A 中随机选择一个元素,概率与其大小成正比。 assume we are doing the same experiment for 100 times with replacement, in each experiment you will print a number that is selected randomly from A.假设我们用替换进行了 100 次相同的实验,在每个实验中,您将打印一个从 A 中随机选择的数字。
Ex 1: A = [0 5 27 6 13 28 100 45 10 79]
let f(x) denote the number of times x getting selected in 100 experiments.
f(100) > f(79) > f(45) > f(28) > f(27) > f(13) > f(10) > f(6) > f(5) > f(0)
Initially, I took the sum of all the elements of list A最初,我取了列表 A 中所有元素的总和
I then divided (in order to normaliz) each element of list A by the sum and stored each of these values in another list (d_dash)然后我将列表 A 的每个元素除以(为了规范化)总和,并将这些值中的每一个存储在另一个列表中 (d_dash)
I then created another empty list (d_bar), that takes in cumalative sum of all elements of d_dash然后我创建了另一个空列表(d_bar),它接收 d_dash 所有元素的累积总和
created variable r, where r= random.uniform(0.0,1.0), and then for the length of d_dash comapring r to d_dash[k], if r<=d_dash[k], return A[k]创建变量 r,其中 r= random.uniform(0.0,1.0),然后对于 d_dash 的长度,将 r 映射到 d_dash[k],如果 r<=d_dash[k],则返回 A[k]
However, I'm getting the error list index out of range
near d_dash[j].append((A[j]/sum)), not sure what is the issue here as I did not exceed the index of either d_dash or A[j].但是,我得到的错误list index out of range
d_dash[j].append((A[j]/sum)) 附近list index out of range
,不确定这里有什么问题,因为我没有超过 d_dash 或 A 的索引[j]。
Also, is my logic correct ?另外,我的逻辑正确吗? sharing a better way to do this would be appreciated.分享一个更好的方法来做到这一点将不胜感激。
Thanks in advance.提前致谢。
import random
A = [0,5,27,6,13,28,100,45,10,79]
def propotional_sampling(A):
sum=0
for i in range(len(A)):
sum = sum + A[i]
d_dash=[]
for j in range(len(A)):
d_dash[j].append((A[j]/sum))
#cumulative sum
d_bar =[]
d_bar[0]= 0
for k in range(len(A)):
d_bar[k] = d_bar[k] + d_dash[k]
r = random.uniform(0.0,1.0)
number=0
for p in range(len(d_bar)):
if(r<=d_bar[p]):
number=d_bar[p]
return number
def sampling_based_on_magnitued():
for i in range(1,100):
number = propotional_sampling(A)
print(number)
sampling_based_on_magnitued()
Below is the code to do the same :以下是执行相同操作的代码:
A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]
#Sum of all the elements in the array
S = sum(A)
#Calculating normalized sum
norm_sum = [ele/S for ele in A]
#Calculating cumulative normalized sum
cum_norm_sum = []
cum_norm_sum.append(norm_sum[0])
for itr in range(1, len(norm_sum), 1) :
cum_norm_sum.append(cum_norm_sum[-1] + norm_sum[itr])
def prop_sampling(cum_norm_sum) :
"""
This function returns an element
with proportional sampling.
"""
r = random.random()
for itr in range(len(cum_norm_sum)) :
if r < cum_norm_sum[itr] :
return A[itr]
#Sampling 1000 elements from the given list with proportional sampling
sampled_elements = []
for itr in range(1000) :
sampled_elements.append(prop_sampling(cum_norm_sum))
Below image shows the frequency of each element in the sampled points :下图显示了采样点中每个元素的频率:
Clearly the number of times each elements appears is proportional to its magnitude.显然,每个元素出现的次数与其大小成正比。
Cumulative sum can be computed by itertools.accumulate
.累积和可以通过itertools.accumulate
计算。 The loop:循环:
for p in range(len(d_bar)):
if(r<=d_bar[p]):
number=d_bar[p]
can be substituted by bisect.bisect()
( doc ):可以用bisect.bisect()
( doc ) 代替:
import random
from itertools import accumulate
from bisect import bisect
A = [0,5,27,6,13,28,100,45,10,79]
def propotional_sampling(A, n=100):
# calculate cumulative sum from A:
cum_sum = [*accumulate(A)]
# cum_sum = [0, 5, 32, 38, 51, 79, 179, 224, 234, 313]
out = []
for _ in range(n):
i = random.random() # i = [0.0, 1.0)
idx = bisect(cum_sum, i*cum_sum[-1]) # get index to list A
out.append(A[idx])
return out
print(propotional_sampling(A))
Prints (for example):打印(例如):
[10, 100, 100, 79, 28, 45, 45, 27, 79, 79, 79, 79, 100, 27, 100, 100, 100, 13, 45, 100, 5, 100, 45, 79, 100, 28, 79, 79, 6, 45, 27, 28, 27, 79, 100, 79, 79, 28, 100, 79, 45, 100, 10, 28, 28, 13, 79, 79, 79, 79, 28, 45, 45, 100, 28, 27, 79, 27, 45, 79, 45, 100, 28, 100, 100, 5, 100, 79, 28, 79, 13, 100, 100, 79, 28, 100, 79, 13, 27, 100, 28, 10, 27, 28, 100, 45, 79, 100, 100, 100, 28, 79, 100, 45, 28, 79, 79, 5, 45, 28]
The reason you got "list index out of range" message is that you created an empty list "d_bar =[]" and the started assigning value to it "d_bar[k] = d_bar[k] + d_dash[k]".您收到“列表索引超出范围”消息的原因是您创建了一个空列表“d_bar =[]”,并开始为其赋值“d_bar[k] = d_bar[k] + d_dash[k]”。 I recoomment using the followoing structor isntead: First, define it in this way:我建议使用以下结构体 istead:首先,以这种方式定义它:
d_bar=[0 for i in range(len(A))] d_bar=[0 for i in range(len(A))]
Also, I believe this code will return 1 forever as there is no break in the loop.此外,我相信这段代码将永远返回 1,因为循环中没有中断。 you can resolve this issue by adding "break".您可以通过添加“中断”来解决此问题。 here is updated version of your code:这是您的代码的更新版本:
A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]
def pick_a_number_from_list(A):
sum=0
for i in A:
sum+=i
A_norm=[]
for j in A:
A_norm.append(j/sum)
A_cum=[0 for i in range(len(A))]
A_cum[0]=A_norm[0]
for k in range(len(A_norm)-1):
A_cum[k+1]=A_cum[k]+A_norm[k+1]
A_cum
r = random.uniform(0.0,1.0)
number=0
for p in range(len(A_cum)):
if(r<=A_cum[p]):
number=A[p]
break
return number
def sampling_based_on_magnitued():
for i in range(1,100):
number = pick_a_number_from_list(A)
print(number)
sampling_based_on_magnitued()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.