[英]How to generate bivariate categorical variables with defined correlation?
Suppose I have two categorical variables A
and B
and both have three levels, 1, 2, 3
with prob 0.2
, 0.3
, and 0.5
for each level. 假设我有两个分类变量
A
和B
并且两者都具有三个等级, 1, 2, 3
与概率0.2
, 0.3
和0.5
为每个级别。 How could I generate a list of random bivariate data of A and B with defined correlation 0.3? 如何生成具有定义的相关性0.3的A和B的随机双变量数据列表? I know for a univariate A or B we can do
我知道对于单变量A或B我们可以做到
A=sample(1:3, 100, T, prob=c(0.2,0.3,0.5))
B=sample(1:3, 100, T, prob=c(0.2,0.3,0.5))
My question is how to sample cbind(A,B)
with cor(A,B)=0.3
? 我的问题是如何以
cor(A,B)=0.3
采样cbind(A,B)
?
Here's an example probability matrix. 这是一个示例概率矩阵。 (Your own will depend on your chosen model.)
(您自己的将取决于您选择的型号。)
# describe probabilities over the space by a matrix
set.seed(1)
nA <- 3
nB <- 3
probmat <- matrix({r<-runif(nA*nB);r/sum(r)},ncol=nB)
# [,1] [,2] [,3]
# [1,] 0.04868724 0.16654119 0.1732284
# [2,] 0.06823764 0.03698311 0.1211728
# [3,] 0.10504609 0.16474081 0.1153628
And here's one way to draw samples from it: 这是从中抽取样本的一种方法:
# rearrange
probs <- c(probmat)
events <- as.matrix(expand.grid(A=1:nA,B=1:nB))
# draw samples
nSamp <- 100
samp <- events[sample.int(nA*nB,nSamp,prob=probs,replace=TRUE),]
Below is equivalent R code for this artical with a matlab example. 下面是等效R代码里面此ARTICAL用MATLAB例子。 You can use apply
pnorm()
and cut()
to the columns the get correlated discrete random variables 您可以将
pnorm()
和cut()
应用于列,以获取相关的离散随机变量
# parameters
nrows <- 10
# The desired correlation matrix
(M <- matrix(c(1.0,0.6,0.3,
0.6,1.0,0.5,
0.3,0.5,1.0),byrow=T,ncol=3))
#> [,1] [,2] [,3]
#> [1,] 1.0 0.6 0.3
#> [2,] 0.6 1.0 0.5
#> [3,] 0.3 0.5 1.0
(U = chol(M))
#> [,1] [,2] [,3]
#> [1,] 1 0.6 0.3000000
#> [2,] 0 0.8 0.4000000
#> [3,] 0 0.0 0.8660254
# generate a random matrix where columns have the desired correlatoin structure
matrix(rnorm(nrows*ncol(M)),ncol=ncol(M))%*%U
#> -0.4326 -0.4089 0.0505
#> -1.6656 -0.4187 -1.3665
#> 0.1253 -0.3955 0.4209
#> 0.2877 1.9192 2.3656
#> -1.1465 -0.7970 -0.9976
#> 1.1909 0.8057 1.1459
#> 1.1892 1.5669 1.8695
#> -0.0376 0.0248 -1.3678
#> 0.3273 0.1199 -1.1880
#> 0.1746 -0.5611 0.2141
# check that this works
cor(matrix(rnorm(1000000*ncol(M)),ncol=ncol(M))%*%U)
#> [,1] [,2] [,3]
#> [1,] 1.0000000 0.5988445 0.2987633
#> [2,] 0.5988445 1.0000000 0.4992603
#> [3,] 0.2987633 0.4992603 1.0000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.