Dear programming gods,
I would like to perform a series of Chi-square tests in R (one test for each column of my species Presence/Absence data.frame) using a function that can yield a single matrix (or data.frame, ideally) which lists as output the species (column name), Chi-square test statistic, df, and p.value.
My species data snippet (actual dimensions = 50x131):
Species<-structure(list(Acesac = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L
), Allpet = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), Ambser = c(0L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), Anoatt = c(0L, 0L, 0L, 1L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 1L, 1L), Aritri = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L,
0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L
)), .Names = c("Acesac", "Allpet", "Ambser", "Anoatt", "Aritri"
), row.names = c("BS1", "BS10", "BS2", "BS3", "BS4", "BS5", "BS6",
"BS7", "BS8", "BS9", "LC1", "LC10", "LC2", "LC3", "LC4", "LC5",
"LC6", "LC7", "LC8", "LC9", "TR1", "TR10", "TR2", "TR3", "TR4"
), class = "data.frame")
My environmental data snippet:
Env<-structure(list(Rock = structure(1:25, .Label = c("BS1", "BS10",
"BS2", "BS3", "BS4", "BS5", "BS6", "BS7", "BS8", "BS9", "LC1",
"LC10", "LC2", "LC3", "LC4", "LC5", "LC6", "LC7", "LC8", "LC9",
"TR1", "TR10", "TR2", "TR3", "TR4", "TR5", "TR6", "TR7", "TR8",
"TR9", "WD1", "WD10", "WD2", "WD3", "WD4", "WD5", "WD6", "WD7",
"WD8", "WD9", "WW1", "WW10", "WW2", "WW3", "WW4", "WW5", "WW6",
"WW7", "WW8", "WW9"), class = "factor"), Climbed = structure(c(1L,
2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L), .Label = c("climbed", "unclimbed"
), class = "factor")), .Names = c("Rock", "Climbed"), row.names = c(NA,
25L), class = "data.frame")
The following apply function code performs a chi-sq test on each species (column) by first creating a contingency table with the number of occurrences of a given species on climbed vs. unclimbed rocks (Env$Climbed).
apply(Species, 2, function(x) {
Table<-table(Env$Climbed, x)
Test<-chisq.test(Table, corr = TRUE)
out <- data.frame("Chi.Square" = round(Test$statistic,3)
, "df" = Test$parameter
, "p.value" = round(Test$p.value, 3)
)
})
This yields a separate data.frame for each species (column). I would like to yield one data.frame, which includes also the column name of each species. Something like this:
mydf<-data.frame("spp"= colnames(Species[1:25,]), "Chi.sq"=c(1:25), "df"=
c(1:25),"p.value"= c(1:25))
Should this be done with ddply or adply? Or just a loop? (I tried, but failed). I reviewed a posting on a similar topic ([ Chi Square Analysis using for loop in R ), but could not make it work for my purposes.
Thank you for your time and expertise! TC
If you save the result of your apply
as
kk <- apply(Species, 2, function(x) {...})
Then you can finish the transformation with
do.call(rbind, Map(function(x,y) cbind(x, species=y), kk, names(kk)))
Here we just append the name of the species to each data.frame and combine all the rows with rbind
.
You can also try
kk <- apply(Species,2,....)
library(plyr)
ldply(kk,.id='spp')
spp Chi.Square df p.value
1 Acesac 0.000 1 1.000
2 Allpet 0.000 1 1.000
3 Ambser 0.000 1 1.000
4 Anoatt 0.338 1 0.561
5 Aritri 0.085 1 0.770
Upd:
library(plyr)
library(reshape2)
ddply(setNames(melt(Species), c("spp", "value")), .(spp), function(x) {
Test <- chisq.test(table(Env$Climbed, x$value), corr = TRUE)
data.frame(Chi.Square = round(Test$statistic, 3), df = Test$parameter, p.value = round(Test$p.value,
3))
})
Don't use apply
on data.frames
. It internally coerces to a matrix, which can have unintended consequences for some data structures (ie factors). It is also not efficient (memorywise).
If you want to apply a function by column, use lapply
(as a data.frame is a list)
You can use plyr::ldply
do automagically return a data.frame
not a list.
# rewrite the function so `Env$Climbed` is not hard coded....
my_fun <- function(x,y) {
Table<-table(y, x)
Test<-chisq.test(Table, corr = TRUE)
out <- data.frame("Chi.Square" = round(Test$statistic,3)
, "df" = Test$parameter
, "p.value" = round(Test$p.value, 3)
)
}
library(plyr)
results <- ldply(Species,my_fun, y = Env$Climbed)
results
# .id Chi.Square df p.value
# 1 Acesac 0.000 1 1.000
# 2 Allpet 0.000 1 1.000
# 3 Ambser 0.000 1 1.000
# 4 Anoatt 0.338 1 0.561
# 5 Aritri 0.085 1 0.770
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.