繁体   English   中英

如何在 Python 中为截断正态分布生成相关随机数?

[英]How to generate correlated random numbers for truncated normal distribution in Python?

我正在尝试使用numpy.random.multivariate_normal()以均值和协方差矩阵(从数据计算)作为输入为三个变量生成相关随机数。

正态分布在 0 和 1 之间被截断,因此,生成的随机数(对于所有三个变量)应该在 0 和 1 之间。但是,一些生成的随机数超出了界限。

我如何控制为每个变量生成正态分布随机数的界限?

编辑:我可以使用scipy.stats.truncnorm从三个截断的正态分布中独立生成不相关的随机数。 但是,在这里我正在寻找可以生成相关随机数的东西。

我迟到了六年,所以我不知道你还需要多少答案。 但前段时间我也需要回答这个问题,所以我为它创建了一个自定义函数。 我想这是一个留作将来参考的好地方:

def BoundedMultivariateNormalDist(means, cov_matrix, dimenions_bounds=None, size=1, rng=None):
    """Custom function: Draw random samples from a multivariate (truly multi-dimentional) normal (Gaussian) distribution, optionally set lower and upper bounds for the both dimentions of the distribution.
    
    Iteratively draws the needed number of samples and discards the samples outside the bounds until the requested sample size is reached.
    
        Parameters
    ----------
    means : ndarray of ints or floats
        means of the n distriburions
    cov_matrix  : 2d array (n by n) of ints or floats
        the covariance matrix of the n distributions
    dimenions_bounds: 2d (n by 2) array of ints or floats, optional
        rows are the dimensions, columns are the lower and upper bounds (in that order). Default is None (i.e unbounded). 
    size : (positive) int, optional
        nummber of samples to draw and return from the distribution. Default is 1. 


    Returns
    -------
    out : ndarray
        Array of samples from the multivariate normal distribution. If size is 1 (or not specified) a single array (of size n) is returned.
    
    Author: Andre3582 
    Created on: 13-06-2020
    Last revised on: 17-08-2021"""
    
    # convert arr_means and cov matrix to np.array
    means = np.array(means)
    cov_matrix = np.array(cov_matrix)
    # check if dimentions agree
    if not means.shape[0] == cov_matrix.shape[0]:
        raise ValueError("dimentions of means and cov matrix do not agree")
    if not cov_matrix.shape[0] == cov_matrix.shape[1]:
        raise ValueError("dimentions of means and cov matrix do not agree")

    ndims = means.shape[0]

    # if no dimenions_bounds if provided make a dimenions_bounds with np.nans
    if dimenions_bounds is None:
        dimenions_bounds = np.tile((np.nan),(ndims,2)) # make a ndims x 2 array of np.nan values
    
    
    # dimenions_bounds should be a (ndims x 2) 2d array where each row represents a dimention, 
    # and, where the first column (index=0) holds the lower bound 
    #     where the second colums (index=1) holds the upper bound
    if not dimenions_bounds.shape == (ndims,2):
        raise ValueError("dimentions of dimenions_bounds doesnt match the dimention of means")
    
    # define a local size
    local_size = size

    # create an empty array
    return_samples = np.empty([0,ndims])

    # generate new samples while the needed size is not reached
    while not return_samples.shape[0] == size:

        # get 'size' number of samples
        samples = rng.multivariate_normal(means, cov_matrix,size=local_size)

        # samples is n array of length n (as many as means, and as many as the side of the cov matrix)
        # we will stack the arrays of sample on top of each other,
        # so each row of retrun_samples is a set of n samples (each sample from one of each dimention)
        # each colums is the set of samples from one of the n dimentions

        # select only the samples that are within the upper and lower bounds for both dimentions

        # for the fist of the nd value (index = 0)

        # for each 'column' we check if the values are within the bounds of that respective column

        for dim, bounds in enumerate(dimenions_bounds):

            # keep only the samples that are bigger than the lower bound
            if not np.isnan(bounds[0]): # bounds[0] is the lower bound
                samples = samples[(samples[:,dim] > bounds[0])]  # samples[:,dim] is the column of the dim

            # keep only the samples that are smaller than the upper bound
            if not np.isnan(bounds[1]): # bounds[1] is the upper bound
                samples = samples[(samples[:,dim] < bounds[1])]   # samples[:,dim] is the column of the dim


        # input the samples into the retun samples
        return_samples = np.vstack([return_samples, samples])

        # get new size which is the difference between the requested size and the size so far.
        local_size = size - return_samples.shape[0]
    
    # return a single value when the requested size is 1 (or not specified)
    if return_samples.shape[0] == 1:
        return return_samples[0]
    # otherwise 
    else:
        return return_samples

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM