I'm trying to create a function that does an element wise prop.test in R between the x1 and x2 variables and returns a list of p-values for each test. x1 and x2 represent the number of success in each category. I was thinking that sapply would do the trick but I cannot figure out how to get it to work.
set.seed(4576)
x1 <- round(runif(15, 200, 1000))
x2 <- round(runif(15, 200, 1000))
p <- cbind(x1, x2)
x1 x2
[1,] 919 559
[2,] 471 975
[3,] 537 792
[4,] 776 524
[5,] 329 603
[6,] 201 610
[7,] 520 353
[8,] 461 853
[9,] 491 765
[10,] 527 358
[11,] 248 331
[12,] 953 322
[13,] 453 680
[14,] 401 654
[15,] 962 358
function(data) {
n1 <- sum(data[,1])
n2 <- sum(data[,2])
sapply(data, function(x) {
prop.test(x = c(data[,1], data[,2]), n = c(n1, n2) )$p.value
} )
}
I'm probably just misunderstanding how to use sapply but any help would be appreciated!
Probably easiest to sapply
to the row indices, then you don't have to extract every value from p
manually.
sapply(1:nrow(p), function(z) prop.test(p[z,, drop = FALSE])$p.value)
# [1] 9.810393e-21 6.072933e-40 3.228340e-12 3.366985e-12 3.807659e-19 1.487836e-46 1.929026e-08 3.988440e-27 1.327621e-14 1.630269e-08 6.548799e-04
# [12] 1.141069e-69 1.891166e-11 8.598155e-15 7.322714e-62
It is not exactly clear what your data represent, but I'm assuming in the above that the two columns in p
are counts of successes and failures, respectively.
This matters because R will actually execute a different proportion test depending on exactly what data structure you supply. Example:
> sapply(1:nrow(p), function(z) prop.test(p[z,, drop = FALSE], n = colSums(p))$p.value)
[1] 9.810393e-21 6.072933e-40 3.228340e-12 3.366985e-12 3.807659e-19 1.487836e-46 1.929026e-08 3.988440e-27 1.327621e-14 1.630269e-08 6.548799e-04 1.141069e-69
[13] 1.891166e-11 8.598155e-15 7.322714e-62
> sapply(1:nrow(p), function(z) prop.test(p[z,, drop = TRUE], n = colSums(p))$p.value)
[1] 7.981801e-28 6.509059e-37 6.883520e-10 8.391497e-17 1.044857e-16 1.291989e-43 3.079194e-11 3.329273e-24 3.663355e-12 2.373325e-11 5.689494e-03 5.212655e-84
[13] 2.658030e-09 1.781938e-12 2.023293e-75
These numbers are all floating point representations of 0, so the different in this case is irrelevant, but if you take a look at a single iteration of these two different types of codes you'll see what R is doing different and thus why it is giving you different p-values:
> prop.test(p[1,, drop = FALSE], n = colSums(p))
1-sample proportions test with continuity correction
data: p[1, , drop = FALSE], null probability 0.5
X-squared = 87.1996, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5964359 0.6464965
sample estimates:
p
0.6217862
> prop.test(p[1,, drop = TRUE], n = colSums(p))
2-sample test for equality of proportions with continuity correction
data: p[1, , drop = TRUE] out of colSums(p)
X-squared = 119.5388, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
0.03879812 0.05605522
sample estimates:
prop 1 prop 2
0.11140744 0.06398077
Supplying the n
argument actually doesn't matter if drop = FALSE
(ie, if you supply a matrix) because the test it is performing is a comparison of the two numbers in the row.
It sounds like that is not what you want, so you should specify drop = TRUE
(which is the default, and thus you don't actually have to supply it) but specify n
, as I do in the second set of code above.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.