简体   繁体   中英

How to apply a function over an array, requiring 2 vectors from the array and output to a new array in R

I have a function that takes 2 vectors and return 2 scalars. The vectors are a part of a big array. I want to apply the function over the array, but I didn't succeed using the apply family. I'm sure it's possible, I just didn't grasp the use of apply and a function and couldn't find an answer I can understand in similar questions here. The loop method gives the desired results but is very slow (my data is bigger than in the example below and the function is more complex). I'll be grateful for solutions!

# function receives two vectors and returns 2 scalars
fnd <- function(depths,temps) {
    return (lm(depths~temps)$coefficients) }

d1 <- 20
stdt <- as.Date("2023-02-01") ; endt <- stdt + d1 -1
Time00 <- seq(stdt,endt,"day")

# input array
ar1 <- array(data=runif(2*10*d1), dim=c(2,10,d1), dimnames = list(c("Depth","Temp"),c(0:9),Time00))

# prepare output array
res_ar <- array(data=NA, dim=c(2,d1), dimnames=list(c("b","a"),Time00))

# this loop gives the desired result but is inefficient
for (i in 1:d1) {
  res_ar[,i] <- fnd(ar1[1,,i],ar1[2,,i])
}

I've taken some liberties to make the input data a little more amenable to vectorized processing. Hope this is helful and a bit faster!

set.seed(1234)

# function to fit linear model
fnd <- function(depths, temps) return (lm(depths ~ temps)$coefficients)

# script settings
d1 <- 20
stdt <- as.Date("2023-02-01") ; 
endt <- stdt + d1 - 1
Time00 <- seq(stdt, endt, "day")

# let's use a data.frame rather than an array as our input data structure
ar1_df <- data.frame(matrix(data = runif(2 * 10 * d1), ncol = 2), 
           lapply(as.character(Time00), rep, 10) |> unlist()) |> 
  `colnames<-`(c('depth', 'temp', 'date'))

# split by date and then fit linear model on each chunk of data
res_list <- split(ar1_df, ar1_df$date) |> 
  lapply(function(x) fnd(x$depth, x$temp))

# now recombine results
res_df <- lapply(res_list, t) |> 
  lapply(as.data.frame) |> 
  data.table::rbindlist()
res_df$Date <- names(res_list)
colnames(res_df) <- c('intercept', 'slope', 'date')

# display results
print(res_df)
     intercept       slope       date
 1: 0.40919951  0.14724501 2023-02-01
 2: 0.22583549  0.38403572 2023-02-02
 3: 0.22270543  0.50704609 2023-02-03
 4: 0.40679160  0.11900251 2023-02-04
 5: 0.56456859 -0.08287233 2023-02-05
 6: 0.56961409 -0.31717291 2023-02-06
 7: 0.22248106  0.39467534 2023-02-07
 8: 0.03923064  0.72326353 2023-02-08
 9: 0.76582883 -0.47078536 2023-02-09
10: 0.38752880 -0.10163376 2023-02-10
11: 0.16397783  0.10675503 2023-02-11
12: 0.49584710  0.04543302 2023-02-12
13: 0.66707078 -0.16270759 2023-02-13
14: 0.72488747 -0.13292537 2023-02-14
15: 0.64390907 -0.05862607 2023-02-15
16: 0.22880075  0.48624960 2023-02-16
17: 0.24411048  0.28118653 2023-02-17
18: 0.39614071  0.19488079 2023-02-18
19: 0.41691043  0.22280767 2023-02-19
20: 0.87571017 -0.23966196 2023-02-20

After some additional reading and testing, I came up with the following: (1) turn the array into list of arrays (2) adjust the function to accept array (3) use sapply on the list.

# adjusted function
fnd2 <- function(ar2) {
  return (lm(ar2[1,]~ar2[2,])$coefficients) }

# turn into list
arlist <- lapply(seq(dim(ar1)[3]), function(x) ar1[,,x])

# apply the function over the whole list
res1 <- sapply(arlist,fnd2)
dimnames(res1) <- list(c("b","a"),Time00)

This is 5% faster than the loop. Still, is there a way to avoid conversion into a list to make the code faster?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM