简体   繁体   中英

How to generate random numbers correlated to a given dataset in Python

I have a 20 element array, x , of floating point numbers, ex:

x = [ 0.35945087, 0.08999019, 0.51313128, 0.75455967, 0.50654956, 0.12404178, 0.25115332, 0.94167661, 0.95727792, 0.35572299, 0.65264679, 0.09416763, 0.861585, 0.19661212, 0.62882119, 0.1180147, 0.17153433, 0.07275386, 0.01895795, 0.00578392]

This data is not normally distributed, it rather follows a power law distribution.

I need to generate a second array, y , which is correlated with x and the correlation coefficient is 0.70 .

How do I do this with python?

This is one of those things that sounds very easy to ask, but is complicated when you get down to the details. I can only point you in the right direction rather than give you a straightforward recipe.

Notionally, what you need to do is construct a bivariate distribution where the marginal distributions are both power laws (presumably, the same power law) but has the desired correlation coefficient.

(X, Y) ~ f(x, y) st X ~ powerlaw(params); Y ~ powerlaw(params); corr(X, Y) = 0.7

This can be done through a copula .

For each sample x[i] that you have, you find the univariate conditional distribution Y ~ f(x=x[i], y) and sample from it.

Note that the correlation coefficient is probably not particularly meaningful when applied to power law distributions. Power law distributions do not in general have finite first and second moments.

y = [number * 0.7 for number in x]

这是您需要的吗?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM