简体   繁体   English

访问存储在 NumPy 数组中的树节点的最有效方法是什么

[英]What is most efficient way to access nodes of a tree stored in a NumPy array

Imagine we have a tree of values stored in a NumPy array.想象一下,我们有一棵存储在 NumPy 数组中的值树。 For example -例如 -

In [1]: import numpy as np

In [2]: tree = np.array([[0, 6], [0, 4], [1, 3], [2, 9], [3, 1], [2, 7]]);

In [3]: tree.shape
Out[3]: (6, 2)

Each node in the tree is row in the array.树中的每个节点都是数组中的行。 The first row tree[0] is the root node [0, 6] .第一行tree[0]是根节点[0, 6] The first column tree[:,0] contains the row number of the node's parent and the second column tree[:,1] contains the node's value attribute.第一列tree[:,0]包含节点父节点的行号,第二列tree[:,1]包含节点的值属性。

What is most efficient way to access the value attributes of a given node up to root via its ancestors?通过其祖先访问给定节点直到根的值属性的最有效方法是什么? For example, for the sixth node [2, 7] , this would be [7, 3, 4, 6]例如,对于第六个节点[2, 7] ,这将是[7, 3, 4, 6]

One method is to recursively read the array from the starting node up using the first column as an index for the next ancestor, for example -一种方法是使用第一列作为下一个祖先的索引,从起始节点向上递归读取数组,例如 -

In [20]: i = 5
    ...: values = []
    ...: while i > 0:
    ...:     values.append(tree[i, 1])
    ...:     i = tree[i, 0]
    ...: values.append(tree[0, 1])
    ...: print(values)
[7, 3, 4, 6]

but I found this to be slow for large complex trees.但我发现这对于大型复杂树来说很慢。 Is there a faster way?有没有更快的方法?

Background to my question - I am trying to implement the Monte Carlo tree search (MCTS)我的问题的背景 - 我正在尝试实施蒙特卡洛树搜索(MCTS)

For iterative operation like this, Numpy does not provide any (efficient) vectorization functions.对于这样的迭代操作,Numpy 不提供任何(有效的)向量化函数。 A solution to speed this up is to use Numba (a JIT compiler) and return a Numpy array (since Numba can operate more efficiently on them).加快速度的一种解决方案是使用Numba (一种 JIT 编译器)并返回一个 Numpy 数组(因为 Numba 可以更有效地对它们进行操作)。 Here is an example:这是一个例子:

import numba as nb
import numpy as np

@nb.njit(['(int16[:,:], int_)', '(int32[:,:], int_)', '(int64[:,:], int_)'])
def compute(tree, i):
    values = np.empty(max(tree.shape[0], 1), dtype=tree.dtype)
    cur = 0
    while i > 0:
        assert cur < values.size
        values[cur] = tree[i, 1]
        i = tree[i, 0]
        cur += 1
    assert cur < values.size
    values[cur] = tree[0, 1]
    return values[:cur+1] # Consider using copy() if cur << tree.shape[0]

print(compute(tree, 5))

It takes 0.76 us on my machine as opposed to 1.36 us for the initial code.在我的机器上需要 0.76 us,而初始代码需要 1.36 us。 However, ~0.54 us are spent in calling the JIT and checking the input parameter and 0.1~0.2 us are spent in the allocation of the output array.然而,约 0.54 us 用于调用 JIT 和检查输入参数,0.1 ~ 0.2 us 用于分配 output 数组。 Thus, basically 90% of the time of the Numba function is a constant overhead.因此,基本上 90% 的 Numba function 时间是恒定的开销。 It should be much faster for large trees.对于大树来说应该快得多。 If you have many small trees to compute, then you can call it from a Numba function so to avoid the overhead of calling a JIT function from the slow CPython interpreter.如果你有很多小树要计算,那么你可以从 Numba function 调用它,以避免从慢速 CPython 解释器调用 JIT function 的开销。 When called from a JIT function, the above function takes only 0.063 us on the input example.当从 JIT function 调用时,上面的 function 在输入示例上只需要 0.063 us。 Thus, the Numba function can be up to 22 times faster in this case.因此,在这种情况下,Numba function 最多可以快 22 倍。

Note that it is better to use a small datatype for the tree since random accesses are expensive in large arrays. The smaller the array in memory, the more likely it can fit in CPU caches, the faster the computation.请注意,最好为树使用较小的数据类型,因为随机访问在大型 arrays 中代价高昂。memory 中的数组越小,它越有可能适合 CPU 缓存,计算速度越快。 For trees with less than 65536 items, it is safe to use a uint16 datatype (while the default one is int32 on Windows and int64 on Linux, that is, respectively 2 and 4 times bigger).对于少于 65536 个项目的树,使用uint16数据类型是安全的(而默认数据类型是 Windows 上的int32和 Linux 上的int64 ,即分别大 2 倍和 4 倍)。

Not sure it could help.不确定它是否有帮助。 It depends on how often you do this operation, and also how big and how deep your tree is.这取决于您执行此操作的频率,以及树的大小和深度。

But basically, my suggestion, if you need to accelerate this, would be "precompute every thing for every node", just, then, you can do it numpy's style:但基本上,我的建议是,如果你需要加速这个,将是“为每个节点预先计算每件事”,然后,你可以按照 numpy 的风格来做:

preList=[tree]
idx=tree[:,0]
while (preList[-1][:,0]!=0).any():
    preList.append(preList[-1][idx])
pre=np.stack(preList)

# Values of 6th node
pre[:,5][:,1]
# array([7, 3, 4, 6])

Note that it would always give 4 values, repeating root value if needed.请注意,它总是会给出 4 个值,如果需要则重复根值。 But you can stop at the first pre[:,5][:,0] that is 0 (root).但是您可以在第一个 pre[:,5][:,0] 处停止,即 0(根)。

This is just the same thing you are doing (from a row #i = [j,v], getting the parent row #j).这与您正在做的事情相同(从一行#i = [j,v],获取父行#j)。 Just done once for all on all nodes.只需在所有节点上完成一次。 To get a 3D matrix, whose 1st axis is the ancestor axis得到一个 3D 矩阵,其第一个轴是祖先轴

Note that if your computation timing were unbearable with your current algorithm, meaning that your tree is very deep, then, chances are that mine would suffer from heavy memory usage, since my 3d matrix has the size of your 2d matrix times the tree depth.请注意,如果您当前的算法无法忍受您的计算时间,这意味着您的树非常深,那么我的 memory 很可能会遭受重度使用,因为我的 3d 矩阵的大小是您的二维矩阵乘以树深度。

For cpu usage tho, even if, from a strictly "number of operations point of view" what I do is just precompute your algorithm for all nodes in the tree, it probably worth it even if you don't need that much computation, because obviously, numpy array indexation is faster.对于 cpu 使用,即使从严格的“操作数量的角度”来看,我所做的只是为树中的所有节点预先计算你的算法,即使你不需要那么多计算,它也可能是值得的,因为显然,numpy 数组索引更快。

With a tree as small as yours, it takes 46 such request before my method is cheaper than yours (it takes 46 requests to absorb cost of precomputation).对于像你这样小的树,在我的方法比你的方法便宜之前需要 46 个这样的请求(吸收预计算成本需要 46 个请求)。 which is not good, considering that you have only 6 nodes.这不好,考虑到您只有 6 个节点。

But for a 13 nodes tree, precomputation time is 76μs, your code needs 3.12 μs/request, mine 350 ns.但是对于一个 13 节点的树,预计算时间是 76 微秒,你的代码需要 3.12 微秒/请求,我的是 350 纳秒。 So number of request before it worth it drops to 27. Still more than number of nodes (13).所以在它值得之前的请求数量下降到 27。仍然超过节点数量 (13)。

For a 27 nodes tree, precomputation time is 84μs, your code needs 3.81 μs/request, mine still 350 ns.对于一个 27 节点的树,预计算时间是 84 微秒,你的代码需要 3.81 微秒/请求,我的还是 350 纳秒。 So number of requests for which precomputation is profitable is 24.因此预计算有利可图的请求数是 24。

In CPU time, precomputation is O(n.log(n)), your code is O(log(n)).在 CPU 时间里,预计算是 O(n.log(n)),你的代码是 O(log(n))。 And my request is O(1).我的要求是 O(1)。 So, in theory, if number of requests is k, that is O(n.log(n) + k) on my side, and O(klog(n)) on yours.因此,理论上,如果请求数为 k,则我这边的 O(n.log(n) + k) 是 O(n.log(n) + k),而你这边的 O(klog(n))。 Which becomes equivalent if k~n.如果 k~n 则变得等价。 As I said, it is just as calling your code on all possible nodes.正如我所说,这就像在所有可能的节点上调用您的代码一样。 But because of numpy efficiency, precomputation costs less that calling your code n times.但是由于 numpy 的效率,预计算的成本比调用您的代码 n 次要少。 So, it is worthy even if k is smaller than n.因此,即使 k 小于 n 也是值得的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM