简体   繁体   中英

Apply function to each cell across multiple dataframes in R

Say that I have N identical (same number of rows and columns) dataframes:

set.seed(2)
df1 <- data.frame(replicate(100,rnorm(100)))
df2 <- data.frame(replicate(100,rnorm(100)))
dfN <- data.frame(replicate(100,rnorm(100)))

And I want to apply a function (in this case t.test() ) across each "cell" of N dataframes so that what returns is a separate dataframe that contains at value for each cell test performed. Essentially, I want to take the first cell of each dataframe,

one <- df1[1,1]
two <- df2[1,1]
Nth <- dfN[1,1]

Perform a t.test() on those cells,

first.cell.each <- cbind.data.frame(one,two,Nth)
t.test(first.cell.each, mu=0)

And repeat that across all cells (in this case 10000).

edit: clarified

We can create a matrix to store the output of p.value of t.test having the same dimensions of the individual datasets. Then, loop through the sequence of rows and columns, extract the elements from each of the datasets, concatenate, and do the t.test and assign the output to the same row/column index of 'res'.

res <- matrix(, ncol=100, nrow=100)
for(i in seq_len(nrow(df1))){
 for(j in seq_len(ncol(df1))){
  res[i,j] <- t.test(c(df1[i,j], df2[i,j], dfN[i,j]), mu = 0)$p.value

 }}

My code also returns a 100*100 matrix

str(res)
#num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...

If there are many datasets, we can place it in a list , then convert it to an array and do the t.test using apply

lst <-  mget(paste0("df", c(1, 2, "N")))
ar1 <- array(unlist(lst), dim = c(dim(df1), length(lst)))
res2 <-  apply(aperm(ar1, c(3, 1, 2)), c(2,3), FUN = function(x) t.test(x, mu = 0)$p.value) 
str(res2)
# num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...

Suppose you have all your data frames saved in a list datlst , this does the work

z <- matrix(tapply(unlist(datlst, use.names = FALSE),
                   rep(gl(prod(dim(datlst[[1]])), 1), length(datlst)),
                   FUN = function (u) t.test(u, mu = 0)$p.value),
            nrow = nrow(datlst[[1]]))

With your example data frames datlst <- list(df1, df2, dfN) , my code successfully returns you a 100 * 100 matrix:

str(z)
# num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM