简体   繁体   English

在R中生成可行的离散数据采样分布

[英]Generate viable sampling distributions of discrete data in R

I'm trying to simulate 2 X 2 data that would yield a relatively strong negative phi coefficients. 我正在尝试模拟2 X 2数据,这些数据会产生相对较强的负phi系数。

I'm using the library GenOrd as follows: 我正在使用GenOrd库,如下所示:

library(GenOrd)

# Specify sample size N
N <- 40

# Marginal distribution
marginal <- list(c(.5), c(.5))

# Matrix
Sigma <- matrix(c(1.0, -.71, -.71, 1.0), 2, 2, byrow=TRUE)

# Generate a sample of the categorical variables with specified parameters
m <- ordsample(N, marginal, Sigma)

However, I'm getting the following error whenever I input a correlation larger than -.70 . 但是,每当输入大于-.70的相关性时,都会出现以下错误。

Error in contord(list(marginal[[q]], marginal[[r]]), matrix(c(1, Sigma[q,  : 
Correlation matrix not valid!

I'm clearly specifying something untenable somewhere - but I don't know what it is. 我显然在某处指定了站不住脚的东西-但我不知道它是什么。

Help appreciated. 帮助表示赞赏。

I'll give a go at answering this as a coding question. 我将回答编码问题。 The error points to where the packages spots the problem beginning: at your Sigma entry. 错误指向软件包在哪里发现问题的起点:在Sigma条目中。 Given your marginal distribution, having -.71 in your corr. 给定您的边际分布,您的corr中有-.71。 matrix is out of bounds and the packages is warning you of this. 矩阵超出范围,软件包会警告您。 You can see this by altering the signs in your Sigma: 您可以通过更改Sigma中的符号来查看此信息:

Sigma <- matrix(c(1.0, .71, .71, 1.0), 2, 2, byrow=TRUE)
m <- ordsample(N, marginal, Sigma)
> m
       [,1] [,2]
  [1,]    1    1
  [2,]    1    2
  ....

As to WHY -.71 is not valid, you may want to direct that statistical question to Cross Validated for a succinct answer. 至于为什么-.71是无效的,您可能需要将该统计问题直接提交给Cross Validated以得到简洁的答案。

I'm not exactly sure "why", however, I found no problems simulating 2 X 2 data that would yield a relatively strong negative correlation using the generate.binary() function from the MultiOrd package. 我不确定“为什么”,但是,使用MultiOrd包中的generate.binary()函数,我发现模拟2 X 2数据不会产生相对较强的负相关性时没有问题。

For example, the following code will work for the complete range of correlation inputs. 例如,以下代码将适用于相关输入的整个范围。 The documentation for the generate.binary() function indicates that the matrix specified is interpreted as a tetrachoric correlation matrix. generate.binary()函数的文档指出,指定的矩阵被解释为四色相关矩阵。

library(MultiOrd)

# Specify sample size N
N <- 40

# Marginal distribution for two variables as a vector for MultiOrd rather than a list
marginal <- c(.5, .5)

# Correlation (tetrachoric) matrix as target for simulated relationship between variables
Sigma <- matrix(c(1.0, -.71, -.71, 1.0), 2, 2, byrow=TRUE)

# Generate a sample of the categorical variables with specified parameters
m <- generate.binary(40, marginal, Sigma)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM