使用tidyquant R遍历参数列表

Question

I have a dataset which I want to process using tq_mutate and rollapply with different parameter values. 我有一个数据集，我想使用tq_mutate处理并使用不同的参数值进行rollapply处理。

Currently I'm using a for loop to go over all the parameter values but I'm sure this is not the most efficient or fastest way to do this task (especially when I am going to be looking at large numbers of parameter values). 当前，我正在使用for循环遍历所有参数值，但是我确定这不是执行此任务的最有效或最快的方法（尤其是当我要查看大量参数值时）。 How could the for loop be improved or removed? 如何改进或删除for循环？ I suspect it means using purrr::map or some other means (multithreading/multicore etc) but I've not been able to find useful examples online. 我怀疑这意味着使用purrr :: map或其他某种方法（多线程/多核等），但是我无法在线找到有用的示例。

Below is some sample code. 下面是一些示例代码。 Please ignore the simplicity of the dataset and outputs of the scale function, it is for illustrative purposes only. 请忽略数据集和比例函数输出的简单性，仅出于说明目的。 What I want to do is iterate over many different V0 values. 我想做的是遍历许多不同的V0值。

library(dplyr)
library(tidyverse)
library(broom)
library(tidyquant)

my_bogus_function <- function(df, V0=1925) { 
  # WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
  # FOR THE PURPOSES OF THE QUESTION
  c(V0, V0*2)
}

window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
    tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>% 
    dplyr::select("date", "open")

# CAN THIS LOOP BE DONE IN A MORE EFFICIENT MANNER? 
for (i in (1825:1830)){
  df <- df %>% 
        tq_mutate(mutate_fun = rollapply,
                  width      = window_size,
                  by.column  = FALSE,
                  FUN        = my_bogus_function,
                  col_rename = gsub("$", sprintf(".%d", i), cnames), 
                  V0 = i
    )
}
# END OF THE FOR LOOP I WANT FASTER

Answer 1

Given that R uses one core I have found improvement by using the packages parallel, doSNOW and foreach which allows multiple cores to be used (Note that I'm on a windows machine so some other packages are not available). 鉴于R使用一个内核，我发现通过使用并行，doSNOW和foreach软件包允许使用多个内核（请注意，我在Windows计算机上，因此某些其他软件包不可用）而有所改进。

I'm sure there are other answers out there to multithread/parallelise/vectorise code. 我确信对于多线程/并行化/向量化代码还有其他答案。

Here is the code for anyone interested. 这是有兴趣的人的代码。

library(dplyr)
library(tidyverse)
library(tidyquant)
library(parallel)
library(doSNOW)  
library(foreach)

window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
  tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>% 
  dplyr::select("date", "open")

my_bogus_function <- function(df, V0=1925) { 
  # WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
  # FOR THE PURPOSES OF THE QUESTION
  c(V0, V0*2)
}

# CAN THIS LOOP BE DONE IN A MORE EFFICIENT/FASTER MANNER? YES 
numCores <- detectCores() # get the number of cores available
cl <- makeCluster(numCores, type = "SOCK")
registerDoSNOW(cl) 

# Function to combine the outputs 
mycombinefunc <-  function(a,b){merge(a, b, by = c("date","open"))}

# Run the loop over multiple cores
meh <- foreach(i = 1825:1830, .combine = "mycombinefunc") %dopar% {
  message(i)
  df %>% 
    # Adjust everything
    tq_mutate(mutate_fun = rollapply,
              width      = window_size,
              by.column  = FALSE,
              FUN        = my_bogus_function,
              col_rename = gsub("$", sprintf(".%d", i), cnames), 
              V0 = i
    )
}
stopCluster(cl)
# END OF THE FOR LOOP I WANTED FASTER

使用tidyquant R遍历参数列表

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-11-26 23:04:55

使用tidyquant R遍历参数列表

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-11-26 23:04:55

解决方案1
0 已采纳 2018-11-26 23:04:55