简体   繁体   中英

nested loop in r to correlate columns of df1 to columns of df2

I have two datasets with abundance data from groups of different species. Columns are species and rows are sites. The sites (rows) are identical between the two datasets and what i am trying to do is to correlate the columns of the first dataset to the columns of the second dataset in order to see if there is a positive or a negative correlation.

library(Hmisc)
rcorr(otu.table.filter$sp1,new6$spA, type="spearman"))$P
rcorr(otu.table.filter$sp1,new6$spA, type="spearman"))$r

the first will give me the p value of the relation between sp1 and spA and the second the r value

I initially created a loop that allowed me to check all species of the first dataframe with a single column of the second dataframe. Needless to say if I was to make this work I would have to repeat the process a few hundred times. My simple loop for one column of df1(new6) against all columns of df2(otu.table.filter)

pvalues = list() 
for(i in 1:ncol(otu.table.filter)) {      
pvalues[[i]] <-(rcorr(otu.table.filter[ , i], new6$Total, type="spearman"))$P
}
rvalues = list()
for(i in 1:ncol(otu.table.filter)) {     
rvalues[[i]] <-(rcorr(otu.table.filter[ , i], new6$Total, type="spearman"))$r
}
p<-NULL
for(i in 1:length(pvalues)){
tmp <-print(pvalues[[i]][2])
p <- rbind(p, tmp)
}
r<-NULL
for(i in 1:length(rvalues)){
tmp <-print(rvalues[[i]][2])
r <- rbind(r, tmp)
}

fdr<-as.matrix(p.adjust(p, method = "fdr", n = length(p)))
sprman<-cbind(r,p,fdr) 

and using the above as a starting point I tried to create a nested loop that each time would examine a column of df1 vs all columns of df2 and then it would proceed to the second column of df1 against all columns of df2 etc etc

but here i am a bit lost and i could not find an answer for a solution in r

I would assume that the pvalues output should be a list of

pvalues[[i]][[j]]

and similarly the rvalues output

rvalues[[i]][[j]]

but I am a bit lost and I dont know how to do that as I tried

pvalues = list()
rvalues = list()
for (j in 1:7){
for(i in 1:ncol(otu.table.filter)) {    
pvalues[[i]][[j]] <-(rcorr(otu.table.filter[ , i], new7[,j], type="spearman"))$P
}
for(i in 1:ncol(otu.table.filter)) {    
rvalues[[i]][[j]] <-(rcorr(otu.table.filter[ , i], new7[,j], type="spearman"))$r
}
}

but I cannot make it work cause I am not sure how to direct the output in the lists and then i would also appreciate if someone could help me with the next part which would be to extract for each comparison the p and r value and apply the fdr function (similar to what i did with my simple loop)

here is a subset of my two dataframes

Here a small demo. Let's assume two matrices x and y with a sample size n. Then correlation and approximate p-values can be estimated as:

n <- 100
x <- matrix(rnorm(10 * n), nrow = n)
y <- matrix(rnorm(5 * n), nrow = n)

## correlation matrix
r <- cor(x, y, method = "spearman")

## p-values
pval <- function(r, n) 2 * (1 - pt(abs(r)/sqrt((1 - r^2)/(n - 2)), n - 2))
pval(r, n)

## for comparison
cor.test(x[,1], y[,1], method = "spearman", exact = FALSE)

More details can be found here: https://stats.stackexchange.com/questions/312216/spearman-correlation-significancy-test

Edit

And finally a loop with cor.test :

## for comparison
p <- matrix(NA, nrow = ncol(x), ncol=ncol(y))
for (i in 1:ncol(x)) {
  for (j in 1:ncol(y)) {
    p[i, j] <- cor.test(x[,i], y[,j], method = "spearman")$p.value    
  }
}

p

The values differ a somewhat, because the first uses the t-approximation then the second the "exact AS 89 algorithm" of cor.test .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM