為什么這個遞歸函數比迭代函數快 3 倍？

Question

我有一個簡單的遞歸函數，可以構造一個特定深度的二叉樹。

我認為帶有 DFS 堆棧的迭代版本將實現類似的性能，但令人驚訝的是它慢了 3 倍！

更准確地說，在我的機器上，深度為 15 的遞歸版本需要 ~330_000 ns，而帶有堆棧的迭代版本需要 ~950_000 ns。

令人驚訝的性能是否可以歸因於優越的緩存局部性（這對於遞歸函數顯然應該更好）。

我用於性能基准測試的代碼：

class Main {
    public static void main(String[] args) {
        long startTime = System.nanoTime();
        long runs;
        Tree t = null;
        for(runs=0; (System.nanoTime() - startTime)< 3_000_000_000L ; runs++) {
            t = createTree3(15);
        }
        System.out.println((System.nanoTime() - startTime) / runs + " ns/call");
    }

    static Tree createTree(int depth) {
        Tree t = new Tree();
        createTreeHlp(t, depth);
        return t;
    }

    static void createTreeHlp(Tree tree, int depth) {
        if (depth == 0)
            tree.init(0, null, null);
        else {
            tree.init(depth, new Tree(), new Tree());
            createTreeHlp(tree.leftChild, depth -1);
            createTreeHlp(tree.rghtChild, depth -1);
        }
    }


    static Tree createTree3(int depth_) {
        TreeStack stack = new TreeStack();
        Tree result = new Tree();
        stack.put(result, depth_);
        while (!stack.isEmpty()) {
            int depth = stack.depth[stack.stack][stack.index];
            Tree tree = stack.tree[stack.stack][stack.index];
            stack.dec();
            if (depth == 0)
                tree.init(0, null, null);
            else {
                tree.init(depth, new Tree(), new Tree());
                stack.put(tree.leftChild, depth -1);
                stack.put(tree.rghtChild, depth -1);
            }
        }
        return result;
    }
}

class Tree {
    int payload;
    Tree leftChild;
    Tree rghtChild;

    public Tree init(int payload, Tree leftChild, Tree rghtChild) {
        this.leftChild = leftChild;
        this.rghtChild = rghtChild;
        this.payload = payload;
        return this;
    }

    @Override
    public String toString() {
        return "Tree(" +payload+", "+ leftChild + ", " + rghtChild + ")";
    }
}
class TreeStack {

    Tree[][] tree;
    int[][] depth;

    int stack =  1;
    int index = -1;

    TreeStack() {
        this.tree = new Tree[100][];
        this.depth = new int[100][];

        alloc(100_000);
        --stack;
        alloc(0);
    }

    boolean isEmpty() {
        return index == -1;
    }

    void alloc(int size) {
        tree[stack] = new Tree[size];
        depth[stack] = new int[size];
    }

    void inc() {
        if (tree[stack].length == ++index) {
            if (tree[++stack] == null) alloc(2 * index);
            index = 0;
        }
    }
    void dec() {
        if (--index == -1)
            index = tree[--stack].length - 1;
    }

    void put(Tree tree, int depth) {
        inc();
        this.tree[stack][index] = tree;
        this.depth[stack][index] = depth;
    }
}

Answer 1

簡短的回答：因為你是這樣編碼的。

長答案：你創建一個堆棧，把東西放進去，從中取出東西，然后做起來非常復雜。 讓我們簡單地針對這種情況來做。 您想要一棵具有特定深度的樹，其中包含所有子項，值是深度，您首先需要最深的級別。 這是一個簡單的方法：

static Tree createTree3(int depth_) {
    Tree[] arr = new Tree[1 << depth_];

    int count = 1 << depth_;
    for (int i=0; i<count; i++)
        arr[i] = new Tree().init(0, null, null);

    int d = 1;
    count >>= 1;
    while (count > 0)
    {
        for (int i=0; i<count; i++)
        {
            Tree t = new Tree().init(d, arr[i * 2], arr[i * 2 + 1]);
            arr[i] = t;
        }
        count >>= 1;
        d++;
    }

  return arr[0];
}

它首先創建最低級別的節點，其中有 2^depth 。 然后它創建下一級節點並添加子節點。 然后下一個和下一個。 沒有堆棧，沒有遞歸，只是簡單的循環。

我通過運行 20000 次到深度 14 來對它進行基准測試，因此不需要獲取時間或任何東西，只需創建樹即可。 我的 i7 筆記本電腦上的結果：

你的遞歸需要 ~187µs/tree
我的迭代需要大約 177 微秒/樹

如果我運行深度 15，那么它是 311 對 340。

時間會發生變化，因為它不檢查 CPU 時間而是檢查系統時間，這取決於 JITter 是否以不同的方式做事等等。

但簡而言之：在這種情況下，即使進行這種簡單的更改，也可以輕松地使迭代與遞歸一樣快，而且我相信還有更聰明的方法。

為什么這個遞歸函數比迭代函數快 3 倍？

問題描述

1 個解決方案

解決方案1
2 2019-09-08 07:27:11

為什么這個遞歸函數比迭代函數快 3 倍？

問題描述

1 個解決方案

解決方案1 2 2019-09-08 07:27:11

解決方案1
2 2019-09-08 07:27:11