简体   繁体   中英

How do I generate correlated random numbers in Python?

How do I create a set of n vectors of dimensionality d such that elements have correlation c (ie, if a vector has one large element, the other elements are likely to be large)?

For demonstration, let's say n=5, d=3, and c=0.7.

Is there some way to set up conv here: https://numpy.org/doc/stable/reference/random/generated/numpy.random.multivariate_normal.html

This may be too much to ask, but what if I want the numbers drawn from a normal distribution?

Thanks!

Edit: Basically I'm trying to create a synthetic population whose individuals differ in some latent variable, and ideally this latent variable would follow a normal distribution. For instance, the psychometric g factor summarizes performance on multiple tests, and explains a certain amount of variance between people on a given test. So I'd like to create n vectors (population size) of dimensionality d (number of tasks), but maybe c needs to be a vector of d numbers? And I might need to specify a vector of d numbers for the latent variable scores (eg, g), or maybe that emerges from how the vectors for the individuals are created?

Might this be what you are looking for?

import numpy as np


def gen_random(n: int, d: int, covar: float) -> np.ndarray:
    """
    Paramters
    ---------
    n : int
        number of samples generated
    d : int
        dimensionality of samples
    covar : float
        uniform covariance for samples
    
    Returns
    -------
    samples : np.ndarray
        samples in as (n, d)-matrix
    """
    cov_mat = np.ones((d, d)) * covar; np.fill_diagonal(cov_mat, 1)
    offset = np.zeros(d)

    return np.random.multivariate_normal(offset, cov_mat, size=n)


v = gen_random(n=10_000, d=3, covar=0.7)
print(v)
# [[ 0.03031736  0.18227023 -0.1302022 ]
#  [-0.17770689  0.70979971 -0.74631702]
#  [-0.78485455 -0.73942846 -0.04819704]
#  ...
#  [ 2.5928135   2.43727782  1.59459156]
#  [ 0.33443158 -0.74126937 -0.7542286 ]
#  [ 0.11238505 -0.1940429   0.7397402 ]]

# sanity check
print(np.corrcoef(v, rowvar=False))
# [[1.         0.6985444  0.69802535]
#  [0.6985444  1.         0.70168241]
#  [0.69802535 0.70168241 1.        ]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM