简体   繁体   中英

Element wise prop.test in R

I'm trying to create a function that does an element wise prop.test in R between the x1 and x2 variables and returns a list of p-values for each test. x1 and x2 represent the number of success in each category. I was thinking that sapply would do the trick but I cannot figure out how to get it to work.

set.seed(4576)

x1 <- round(runif(15, 200, 1000))
x2 <- round(runif(15, 200, 1000))

p <- cbind(x1, x2)

       x1  x2
 [1,] 919 559
 [2,] 471 975
 [3,] 537 792
 [4,] 776 524
 [5,] 329 603
 [6,] 201 610
 [7,] 520 353
 [8,] 461 853
 [9,] 491 765
[10,] 527 358
[11,] 248 331
[12,] 953 322
[13,] 453 680
[14,] 401 654
[15,] 962 358

function(data) {

    n1 <- sum(data[,1])
    n2 <- sum(data[,2])

    sapply(data, function(x) {

    prop.test(x = c(data[,1], data[,2]), n = c(n1, n2) )$p.value   

    } )

}

I'm probably just misunderstanding how to use sapply but any help would be appreciated!

Probably easiest to sapply to the row indices, then you don't have to extract every value from p manually.

sapply(1:nrow(p), function(z) prop.test(p[z,, drop = FALSE])$p.value)
#  [1] 9.810393e-21 6.072933e-40 3.228340e-12 3.366985e-12 3.807659e-19 1.487836e-46 1.929026e-08 3.988440e-27 1.327621e-14 1.630269e-08 6.548799e-04
# [12] 1.141069e-69 1.891166e-11 8.598155e-15 7.322714e-62

It is not exactly clear what your data represent, but I'm assuming in the above that the two columns in p are counts of successes and failures, respectively.

This matters because R will actually execute a different proportion test depending on exactly what data structure you supply. Example:

> sapply(1:nrow(p), function(z) prop.test(p[z,, drop = FALSE], n = colSums(p))$p.value)
 [1] 9.810393e-21 6.072933e-40 3.228340e-12 3.366985e-12 3.807659e-19 1.487836e-46 1.929026e-08 3.988440e-27 1.327621e-14 1.630269e-08 6.548799e-04 1.141069e-69
[13] 1.891166e-11 8.598155e-15 7.322714e-62
> sapply(1:nrow(p), function(z) prop.test(p[z,, drop = TRUE], n = colSums(p))$p.value)
 [1] 7.981801e-28 6.509059e-37 6.883520e-10 8.391497e-17 1.044857e-16 1.291989e-43 3.079194e-11 3.329273e-24 3.663355e-12 2.373325e-11 5.689494e-03 5.212655e-84
[13] 2.658030e-09 1.781938e-12 2.023293e-75

These numbers are all floating point representations of 0, so the different in this case is irrelevant, but if you take a look at a single iteration of these two different types of codes you'll see what R is doing different and thus why it is giving you different p-values:

> prop.test(p[1,, drop = FALSE], n = colSums(p))

        1-sample proportions test with continuity correction

data:  p[1, , drop = FALSE], null probability 0.5
X-squared = 87.1996, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5964359 0.6464965
sample estimates:
        p 
0.6217862 

> prop.test(p[1,, drop = TRUE], n = colSums(p))

        2-sample test for equality of proportions with continuity correction

data:  p[1, , drop = TRUE] out of colSums(p)
X-squared = 119.5388, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 0.03879812 0.05605522
sample estimates:
    prop 1     prop 2 
0.11140744 0.06398077

Supplying the n argument actually doesn't matter if drop = FALSE (ie, if you supply a matrix) because the test it is performing is a comparison of the two numbers in the row.

It sounds like that is not what you want, so you should specify drop = TRUE (which is the default, and thus you don't actually have to supply it) but specify n , as I do in the second set of code above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM