简体   繁体   English

如何在Python中生成与给定数据集相关的随机数

[英]How to generate random numbers correlated to a given dataset in Python

I have a 20 element array, x , of floating point numbers, ex: 我有20个元素数组x ,它们的浮点数例如:

x = [ 0.35945087, 0.08999019, 0.51313128, 0.75455967, 0.50654956, 0.12404178, 0.25115332, 0.94167661, 0.95727792, 0.35572299, 0.65264679, 0.09416763, 0.861585, 0.19661212, 0.62882119, 0.1180147, 0.17153433, 0.07275386, 0.01895795, 0.00578392]

This data is not normally distributed, it rather follows a power law distribution. 该数据不是正态分布,而是遵循幂律分布。

I need to generate a second array, y , which is correlated with x and the correlation coefficient is 0.70 . 我需要生成第二个数组y ,该数组与x相关,相关系数为0.70

How do I do this with python? 如何使用python做到这一点?

This is one of those things that sounds very easy to ask, but is complicated when you get down to the details. 这是听起来很容易提出的问题之一,但是当您深入了解细节时却很复杂。 I can only point you in the right direction rather than give you a straightforward recipe. 我只能指出正确的方向,而不能给您一个简单的方法。

Notionally, what you need to do is construct a bivariate distribution where the marginal distributions are both power laws (presumably, the same power law) but has the desired correlation coefficient. 名义上,您需要做的是构造一个二元分布,其中边际分布都是幂律(大概是幂律),但具有所需的相关系数。

(X, Y) ~ f(x, y) st X ~ powerlaw(params); Y ~ powerlaw(params); corr(X, Y) = 0.7

This can be done through a copula . 这可以通过系指来完成。

For each sample x[i] that you have, you find the univariate conditional distribution Y ~ f(x=x[i], y) and sample from it. 对于每个样本x[i] ,您将找到单变量条件分布Y ~ f(x=x[i], y)并从中进行抽样。

Note that the correlation coefficient is probably not particularly meaningful when applied to power law distributions. 注意,当将相关系数应用于幂律分布时,可能没有特别的意义。 Power law distributions do not in general have finite first and second moments. 幂律分布通常不具有有限的第一和第二矩。

y = [number * 0.7 for number in x]

这是您需要的吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM