简体   繁体   中英

Iterating through a list of parameters using tidyquant R

I have a dataset which I want to process using tq_mutate and rollapply with different parameter values.

Currently I'm using a for loop to go over all the parameter values but I'm sure this is not the most efficient or fastest way to do this task (especially when I am going to be looking at large numbers of parameter values). How could the for loop be improved or removed? I suspect it means using purrr::map or some other means (multithreading/multicore etc) but I've not been able to find useful examples online.

Below is some sample code. Please ignore the simplicity of the dataset and outputs of the scale function, it is for illustrative purposes only. What I want to do is iterate over many different V0 values.

library(dplyr)
library(tidyverse)
library(broom)
library(tidyquant)

my_bogus_function <- function(df, V0=1925) { 
  # WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
  # FOR THE PURPOSES OF THE QUESTION
  c(V0, V0*2)
}

window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
    tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>% 
    dplyr::select("date", "open")

# CAN THIS LOOP BE DONE IN A MORE EFFICIENT MANNER? 
for (i in (1825:1830)){
  df <- df %>% 
        tq_mutate(mutate_fun = rollapply,
                  width      = window_size,
                  by.column  = FALSE,
                  FUN        = my_bogus_function,
                  col_rename = gsub("$", sprintf(".%d", i), cnames), 
                  V0 = i
    )
}
# END OF THE FOR LOOP I WANT FASTER

Given that R uses one core I have found improvement by using the packages parallel, doSNOW and foreach which allows multiple cores to be used (Note that I'm on a windows machine so some other packages are not available).

I'm sure there are other answers out there to multithread/parallelise/vectorise code.

Here is the code for anyone interested.

library(dplyr)
library(tidyverse)
library(tidyquant)
library(parallel)
library(doSNOW)  
library(foreach)

window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
  tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>% 
  dplyr::select("date", "open")

my_bogus_function <- function(df, V0=1925) { 
  # WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
  # FOR THE PURPOSES OF THE QUESTION
  c(V0, V0*2)
}

# CAN THIS LOOP BE DONE IN A MORE EFFICIENT/FASTER MANNER? YES 
numCores <- detectCores() # get the number of cores available
cl <- makeCluster(numCores, type = "SOCK")
registerDoSNOW(cl) 

# Function to combine the outputs 
mycombinefunc <-  function(a,b){merge(a, b, by = c("date","open"))}

# Run the loop over multiple cores
meh <- foreach(i = 1825:1830, .combine = "mycombinefunc") %dopar% {
  message(i)
  df %>% 
    # Adjust everything
    tq_mutate(mutate_fun = rollapply,
              width      = window_size,
              by.column  = FALSE,
              FUN        = my_bogus_function,
              col_rename = gsub("$", sprintf(".%d", i), cnames), 
              V0 = i
    )
}
stopCluster(cl)
# END OF THE FOR LOOP I WANTED FASTER

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM