简体   繁体   English

蒙特卡罗树搜索的时间复杂度是多少?

[英]What's the time complexity of Monte Carlo Tree Search?

I'm not sure whether this question should go on stackoverflow or cs.stackexchange.com, so please let me know if I should move it. 我不确定这个问题是否应该在stackoverflow或cs.stackexchange.com上进行,所以请让我知道是否应该移动它。

I'm trying to find the time complexity of Monte Carlo Tree Search (MCTS). 我正在尝试查找蒙特卡罗树搜索(MCTS)的时间复杂度。 Googling doesn't help, so I'm trying to see how far I get calculating it myself. 谷歌搜索无济于事,所以我想看看自己能算出多少。

It does four steps for n iterations, or before the time runs out. 它为n次迭代或在时间用完之前执行四个步骤。 So we'll have 所以我们有

O(n*(selection+expansion+simulation+backpropagation)) O(n *(选择+扩展+模拟+反向传播))

Expansion just adds a child to the currently selected node. 扩展只是将一个子代添加到当前选定的节点。 Assuming you're not using a singly linked list or something like that to store tree children, this can happen in constant time, so we can exclude it: 假设您没有使用单个链接列表或类似的列表来存储树子级对象,则这种情况可能会持续发生,因此我们可以排除它:

O(n*(selection+simulation+backpropagation)) O(n *(选择+模拟+反向传播))

Given the branching factor b , and t total number of nodes in the tree, I'm assuming the selection phase runs in O(b*log b t), because the depth of the tree is log b t, and at every depth, we go over b children. 给定分支因子b和树中节点的总数t ,我假设选择阶段在O(b * log b t)中进行,因为树的深度为log b t,并且在每个深度处,我们超过了两个孩子。

So our time complexity becomes 因此我们的时间复杂度变为

O(n*(b*log b t+simulation+backpropagation)) O(n *(b * log b t +模拟+反向传播))

Backpropagation takes time proportional to the depth of the tree as well, so that becomes: 反向传播所花费的时间也与树的深度成正比,因此变为:

O(n*(b*log b t+simulation+b*log b t)) O(n *(b * log b t +模拟+ b * log b t))

But now I'm not sure how to add the simulation phase to this. 但是现在我不确定如何在其中添加仿真阶段。

After we have selected a node to expand, we expand the node into m random children rather than a single child. 选择要扩展的节点后,将节点扩展为m个随机子级,而不是单个子级。 Furthermore, rather than simulating out the child state only once, we simulate each child state k times. 此外,我们不模拟一次子状态,而是模拟每个子状态k次。

  • m is number of children of a node m是节点的子代数
  • k is number of simulations of a child k是孩子的模拟次数

The runtime of the algorithm can be simply be computed as O(mkI/C) where m and k are the same as before, and I is the number of iterations and C is the number of cores available. 算法运行时间可以简单地计算为O(mkI / C) ,其中m和k与以前相同,I是迭代次数,C是可用内核数。

Reference: 参考:

http://stanford.edu/~rezab/dao/projects/montecarlo_search_tree_report.pdf http://stanford.edu/~rezab/dao/projects/montecarlo_search_tree_report.pdf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM