在R量子代码中循环 - 如何使其更快？

Question

In quantstrat package I have located one of the main culprits for slowness of the applyRule function and wonder if there is more efficient to write the while loop. 在quantstrat包中，我找到了applyRule函数缓慢的主要罪魁祸首之一，并想知道是否有更高效的写入while循环。 Any feedback would be helpful. 任何反馈都会有所帮助。 For anyone experience wrapping this part into Parallel R. 任何人都可以将这部分包装成并行R.

As an option apply would work instead while? 作为一种选择，申请会有效吗？ Or should I re-write this part into new function such as ruleProc and nextIndex? 或者我应该将此部分重新编写为新函数，例如ruleProc和nextIndex？ I am also dveling on Rcpp but that may be a streach. 我也在沉迷于Rcpp，但这可能是一个特殊的问题。 Any help and constructive advice is much appreciated? 非常感谢任何帮助和建设性的建议？

   while (curIndex) {
    timestamp = Dates[curIndex]
    if (isTRUE(hold) & holdtill < timestamp) {
        hold = FALSE
        holdtill = NULL
    }
    types <- sort(factor(names(strategy$rules), levels = c("pre",
        "risk", "order", "rebalance", "exit", "enter", "entry",
        "post")))
    for (type in types) {
        switch(type, pre = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules$pre, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, risk = {
            if (length(strategy$rules$risk) >= 1) {
              ruleProc(strategy$rules$risk, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        }, order = {
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr,)
            } else {
              if (isTRUE(path.dep)) {
                timespan <- paste("::", timestamp, sep = "")
              } else timespan = NULL
              ruleOrderProc(portfolio = portfolio, symbol = symbol,
                mktdata = mktdata, timespan = timespan)
            }
        }, rebalance = , exit = , enter = , entry = {
            if (isTRUE(hold)) next()
            if (type == "exit") {
              if (getPosQty(Portfolio = portfolio, Symbol = symbol,
                Date = timestamp) == 0) next()
            }
            if (length(strategy$rules[[type]]) >= 1) {
              ruleProc(strategy$rules[[type]], timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
            if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
              symbol = symbol, status = "open", timespan = timestamp,
              which.i = TRUE))) {
            }
        }, post = {
            if (length(strategy$rules$post) >= 1) {
              ruleProc(strategy$rules$post, timestamp = timestamp,
                path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
                symbol = symbol, ruletype = type, mktinstr = mktinstr)
            }
        })
    }
    if (isTRUE(path.dep))
        curIndex <- nextIndex(curIndex)
    else curIndex = FALSE
}

Answer 1

Garrett's answer does point to the last major discussion on the R-SIG-Finance list where a related question was discussed. 加勒特的回答确实指出了关于R-SIG财务清单的最后一次重要讨论，其中讨论了相关问题。

The applyRules function in quantstrat is absolutely where most of the time is spent. quantstrat中的applyRules函数绝对是花费大部分时间的地方。

The while loop code copied in this question is the path-dependent part of the applyRules execution. 在这个问题中复制的while循环代码是applyRules执行的路径依赖部分。 I believe all of this is covered in the documentation, but I'll briefly review for SO posterity. 我相信所有这些都在文档中有所涉及，但我将简要回顾一下后代。

We construct a dimension reduction index inside applyRules so that we don't have to observe every timestamp and check it. 我们在applyRules中构建一个降维索引，这样我们就不必观察每个时间戳并检查它。 We only take note of specific points in time where the strategy may reasonably be expected to act on the order book, or where orders may reasonably be expected to get filled. 我们只注意到可以合理地预期策略可以在订单簿上执行的特定时间点，或者可以合理地预期订单被填补的特定时间点。

This is state-dependent and path-dependent code. 这是依赖于状态和路径的代码。 Idle talk of 'vectorization' doesn't make any sense in this context. 在这种背景下，对“矢量化”的闲谈没有任何意义。 If I need to know the current state of the market, the order book, and my position, and if my orders may be modified in a time-dependent manner by other rules, I don't see how this code can be vectorized. 如果我需要知道市场的当前状态，订单和我的位置，如果我的订单可能会被其他规则以时间依赖的方式修改，我看不出这个代码是如何被矢量化的。

From a computer science perspective, this is a state machine. 从计算机科学的角度来看，这是一台状态机。 State machines in almost every language I can think of are usually written as while loops. 我能想到的几乎所有语言的状态机通常都是以while循环编写的。 This isn't really negotiable or changeable. 这不是真正可以谈判或改变的。

The question asks if use of apply would help. 问题是否使用申请会有所帮助。 apply statements in R are implemented as loops, so no, it wouldn't help. R中的apply语句实现为循环，所以不，它没有帮助。 Even a parallel apply such as mclapply or foreach can't help because this is inside a state dependent part of the code. 即使是mclapply或foreach等并行应用也无济于事，因为这是代码中依赖于状态的部分。 Evaluating different time points without regard to state doesn't make any sense. 在不考虑状态的情况下评估不同的时间点没有任何意义。 You'll note that the non-state-dependent parts of quantstrat are vectorized wherever possible, and account for very little of the running time. 您将注意到，quantstrat的非状态相关部分尽可能地进行矢量化，并且占用的运行时间非常少。

The comment made by John suggests removing the for loop in ruleProc . John的评论建议删除ruleProc中的for循环。 All that the for loop does is check each rule associated with the strategy at this point in time. for循环所做的就是检查此时与策略关联的每个规则。 The only compute-intensive part of that loop is the do.call to call the rule function. 该循环中唯一的计算密集型部分是do.call来调用规则函数。 The rest of the for loop is simply locating and matching arguments for these functions, and from code profiling, doesn't take much time at all. for循环的其余部分只是为这些函数定位和匹配参数，而从代码分析中，根本不需要花费太多时间。 It would not make much sense to use a parallel apply here either, since the rule functions are applied in type order, so that cancels or risk directives can be applied before new entry directives. 在这里使用并行应用也没有多大意义，因为规则函数以类型顺序应用，因此可以在新的条目指令之前应用取消或风险指令。 Much as mathematics has an order of operations, or a bank has a deposit/withdrawal processing order, quantstrat has a rule type evaluation order, as laid out in the documentation. 就像数学有一个操作顺序，或者银行有存款/取款处理订单一样，quantstrat有一个规则类型评估订单，如文档中所述。

To speed up execution, there are four main things that can be done: 为了加快执行速度，可以做四件事：

write a non-path dependent strategy : this is supported by the code, and simple strategies may be modeled this way. 编写非路径依赖策略 ：代码支持这一点，简单策略可以这种方式建模。 In this model you would write a custom rule function that calls addTxn directly when you think you should get your fills. 在这个模型中，您将编写一个自定义规则函数，当您认为应该填充时，它会直接调用addTxn 。 It could be a vectorized function operating on your indicators/signals, and should be very fast. 它可能是一个操作指标/信号的矢量化函数，应该非常快。
preprocess your signals :if there are fewer places where the state machine needs to evaluate the state of the order book/rules/portfolio to see if it needs to do something, the speed increase is nearly linear with the reduction in signals. 预处理您的信号 ：如果状态机需要评估订单簿/规则/组合的状态以查看是否需要执行某些操作，则速度增加几乎与信号减少呈线性关系。 This is the area most users neglect, writing signal functions that don't really do evaluation of when action may be required that would modify positions or the order book. 这是大多数用户忽略的区域，写入信号功能并不真正评估何时可能需要修改位置或订单簿的操作。
explicitly parallelize parts of your analysis problem : I commonly write explicitly parallel wrappers to separate out different parameter evaluations or symbol evaluations, see applyParameter for an example using foreach 显式并行化部分分析问题 ：我通常明确地编写并行包装器以分离出不同的参数评估或符号评估，请参阅applyParameter以获取使用foreach的示例
rewrite the state machine inside applyRules in C/C++ : Patches welcome, but do see the link Garrett posted for additional details. 在C / C ++中重写applyRules中的状态机 ：欢迎补丁，但请看Garrett发布的链接以获取更多详细信息。

I can assure you that most strategies can run in a fraction of a core-minute per symbol per day per core on tick data, if a little care is taken to the signal generation functions. 我可以向您保证，如果对信号生成功能稍加注意，大多数策略可以在每个符号每个核心每分钟的核心分钟上运行。 Running large backtests on a laptop is not recommended. 不建议在笔记本电脑上运行大型背景测试。

Ref: quantstrat - applyRules 参考： quantstrat - applyRules

在R量子代码中循环 - 如何使其更快？

问题描述

1 个解决方案

解决方案1
7 2011-10-12 12:48:47

在R量子代码中循环 - 如何使其更快？

问题描述

1 个解决方案

解决方案1 7 2011-10-12 12:48:47

解决方案1
7 2011-10-12 12:48:47