简体   繁体   English

点双数和p值

[英]point biserial and p-value

I am trying to get a point biserial correlation between a continuous vocabulary score and syntactic productivity (dichotomous: productive vs not_productive). 我正在尝试获得连续词汇得分和句法生产力(二分:有生产能力与无生产能力)之间的双歧关系。

I tried both the ltm packages 我尝试了两个ltm软件包

> biserial.cor (lol$voc1_tvl, lol$synt, use = c("complete.obs")) 

and the polycor package 和polycor包

> polyserial( lol$voc1_tvl, lol$synt, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, bins=4)

The problem is that neither test gives me a p-value 问题是没有一个测试给我p值

How could I run a point biserial correlation test and get the associated p-value or alternatively calculate the p-value myself? 如何运行点双数相关测试并获得关联的p值,或者自己计算p值?

Since the point biserial correlation is just a particular case of the popular Peason's product-moment coefficient , you can use cor.test to approximate (more on that later) the correlation between a continuous X and a dichotomous Y. For example, given the following data: 由于点双数相关只是流行的Peason乘积矩的一个特例,因此您可以使用cor.test近似(以后再说)连续X和二分Y之间的相关。例如,给定以下内容数据:

set.seed(23049)
x <- rnorm(1e3)
y <- sample(0:1, 1e3, replace = TRUE)

Running cor.test(x, y) will give you the information you want. 运行cor.test(x, y)将为您提供所需的信息。

    Pearson's product-moment correlation

data:  x and y
t = -1.1971, df = 998, p-value = 0.2316
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.09962497  0.02418410
sample estimates:
        cor 
-0.03786575

As an indication of the similarity between the coefficients, notice how the calculated correlation of -0.03786575 is similar to what ltm::biserial.cor gives you: 为了表明这些系数之间的相似性,请注意-0.03786575的计算相关度与ltm::biserial.cor提供的相似度如何:

> library(ltm)
> biserial.cor(x, y, level = 2)
[1] -0.03784681

The diference lies on the fact that biserial.cor is calculated on the population, with standard deviations being divided by n , where cor and cor.test calculate standard deviations for a sample, dividing by n - 1 . 不同之处在于, biserial.cor是根据总体计算的,标准偏差除以n ,其中corcor.test计算样本的标准偏差,除以n - 1

As cgage noted, you can also use the polyserial() function, which in my example would yield 如cgage所述,您还可以使用polyserial()函数,在我的示例中这将产生

> polyserial(x, y, std.err = TRUE)

Polyserial Correlation, 2-step est. = -0.04748 (0.03956)
Test of bivariate normality: Chisquare = 1.891, df = 5, p = 0.864

Here, I believe the difference in the calculated correlation (-0.04748) is due to polyserial using an optimization algorithm to approximate the calculation (which is unnecessary unless Y has more than two levels). 这里,相信在所计算的相关(-0.04748)的差异是由于polyserial使用优化算法来近似计算(这是不必要的,除非Y具有两个以上的级别)。

Using the ggplot2 dataset mpg as a reproducible example: 使用ggplot2数据集mpg作为可重现的示例:

library(ggplot2)
# Use class as dichotomous variable (must subset)
newData = subset(mpg, class == 'midsize' | class == 'compact')

# Now getting p-value
library(ltm)
polyserial(newData$cty,newData$class, std.err = T)

You will see all the output you desire using std.err=T in polyserial 你会看到所有你想要使用输出std.err=Tpolyserial

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM