重复列表中元素的次数相等

Question

我有一些创建数据的代码，然后我想对数据进行采样。

我的代码首先创建一系列向量，这些向量的间隔从指数分布z_exponential_layers ，然后变为new2 。

然后我取每个new2向量，看看dz多少次适合new2每个元素之间的间隔。

例如，如果new2 = [z1,z2,z3,...,zn]那么代码的第二部分旨在找出 dz 适合[z2-z1,z3-z2,...] 。 因此，如果(z2-z1)/dz = 5 = repeats那么我将存储在列表中vec += [np.random.normal(0,1)]*repeats然后移动到下一个间隔。

import numpy as np
import random
import matplotlib.pyplot as plt

z_max = 100
dz = .01
#Intensity of process
lam = 0.1
#Number of rays
rays = 10
k = int(z_max/dz)
print('k =',k)
#List to store ray z coordinate data
exponential_procs_lists = []
for ray_n in range(2,rays):
    process_length = 10000
    #Compute data for z layers coordinate
    z_exponential_layers = np.cumsum(np.random.exponential(lam,size = process_length))
    #Cutoff values that lie outside z_max
    cutoff = [x for x in z_exponential_layers if x < z_max]
    #Append 0 at the start of exponential z vector
    if min(cutoff) == 0:
        new1 = np.insert(cutoff,0,0)
    else:
        new1 = cutoff
    #Append z_max at end of exponential z vector
    new2 = np.insert(new1,len(new1),z_max)
    #Append exponential ray z vector to list
    exponential_procs_lists.append(new2)

#Create list that will store random numbers data for each ray
big_list = []
#Loop over every ray, check how many dz lie within each layer and assign random variable (k total times)
for list_n in exponential_procs_lists:
    #Create empty list to store random data for each ray
    vec = []
    #Sum repeats checks that there is k elements in each vector, since k = int(z_max/dz)
    sum_repeats = 0
    #Calculate the intervals between each layers coordinate vector
    list_n_diff  = np.diff(list_n)
    for item in list_n_diff:
        #Calculate how many dz fit inside each interval
        repeats = int(item/dz)
        #Repeat random variable 'repeats' times. This ensures that if we sample x times and each
        #time we are in the same interval, that random variable is repeated
        vec += [np.random.normal(0,1)]*repeats
        #Update sum_repeats to check that there is K elements in the vector
        sum_repeats += repeats
    #Print to check sum_repeats equals k in each running of the whole calculation (we in first loop here)
    print('sum repeats =',sum_repeats)
    print('mean interval size =',np.mean(np.diff(new2)))
    #Append m(z) data to the main list, and repeat for each ray
    #Big list is a list of lists, so we must now transform it into a matrix form (np.array)
    big_list.append(vec)

问题是，当我运行这段代码时，包含随机变量的每个vec的长度不等于k并且每次都会改变。 例如，一次运行给出

k = 10000
sum repeats = 9507
mean interval size = 0.0992551287849846
sum repeats = 9493
mean interval size = 0.0992551287849846
sum repeats = 9500
mean interval size = 0.0992551287849846
sum repeats = 9500
mean interval size = 0.0992551287849846
sum repeats = 9479
mean interval size = 0.0992551287849846
sum repeats = 9508
mean interval size = 0.0992551287849846
sum repeats = 9509
mean interval size = 0.0992551287849846
sum repeats = 9485
mean interval size = 0.0992551287849846

如何确保每个向量中的随机元素数等于k ？

Answer 1

为简单起见，假设 dz = 1：

您正在生成一个随机数列表 (z_exponential_layers)，这些随机数始终为正数且大于之前的数 (cumsum)。

然后，您切断（切断）高于上限（z_max）的任何数字

因此，截止列表中的数字在以下范围内： (0, z_max) 。 开放边界是因为 exp(x) > 0 和严格小于 z_max 的条件。

考虑到这一点，截止值是：[z_0, z_1, ..., z_n]，其中 z_0 > 0 和 z_n < z_max，以及 z_(i+1) > z_i

通过使用 np.diff，您将生成范围 (0, z_max) 中每个差异的向量，并且所有这些值的总和等于 (z_n - z_0)，逻辑上小于 z_max。

添加您正在截断差异的事实（通过使用 int(item/dz)），您增加了不等式：

重复 = z_n - z_0 - round_down_loss < z_max。

因此，为了获得repeats = z_max，您需要使z_0 = 0（即，如果min(cutoff) > 0，则插入0）。 然后，你的结果将是 repeats = z_n - z_0 - round_down_loss = z_n - round_down_loss < z_max

如果你去掉四舍五入，你会得到：

repeats = z_n，仍然小于 z_max。

如果您随机生成的增量（使用 np.exponential）变得无穷小，那么您可以渐近地获得它：

重复 -> z_max

考虑到这一点，我提出以下更正：

if min(cutoff) > 0:  # Instead of equal
    new1 = np.insert(cutoff,0,0)
else:
    new1 = cutoff
...
for item in list_n_diff:
    #Calculate how many dz fit inside each interval
    repeats = item/dz  # remove int because it truncates the values

重复列表中元素的次数相等

问题描述

1 个解决方案

解决方案1
0 2020-10-21 10:29:49

重复列表中元素的次数相等

问题描述

1 个解决方案

解决方案1 0 2020-10-21 10:29:49

解决方案1
0 2020-10-21 10:29:49