Javascript：如何使用异步递归树遍历来控制流？

Question

I need to recurse over a tree to perform operations on specific nodes using async operations. 我需要在树上进行递归，以使用异步操作在特定节点上执行操作。 How can I control flow so I have access to the nodes when it's done? 如何控制流量，以便在完成后可以访问节点？

Here's an example situation: 这是一个示例情况：

data = {
  name: "deven",
  children: [
    { name: "andrew" },
    { name: "donovan" },
    { name: "james",
      children: [
        { name: "donatello" },
        { name: "dan" }
      ]
    },
    { name: "jimmy",
      children: [
        { name: "mike" },
        { name: "dank" }
      ]
    }
  ]
};

I have a function who's goal it is to iterate through the tree and capitalize all names that start with 'd'. 我有一个函数，它的目标是遍历树并将所有以'd'开头的名称大写。 Afterwards, I want to pass the tree into another function to do some more work (possibly delete all nodes that have a name that starts with 'a'), but only after the initial processing has been done: 之后，我想将树传递给另一个函数来做更多工作（可能删除所有名称以'a'开头的节点），但只有在完成初始处理之后：

function capitalize_d(node) {
    if(node.name === "d") {
        node.name = node.name.toUpperCase();
    }

    if(node.children != null) {
        for(var i = 0; i < node.children.length; i++) {
            capitalize_d(node.children[i]);
        }
    }
}

function remove_a(node) {
}

capitalize_d(data);

// Should only get called after all the d's have been capitalized.
remove_a(data);

The above code works fine, because capitalize_d is blocking. 上面的代码工作正常，因为capitalize_d是阻塞的。 If capitalize_d recurses asynchronously, how can we guarantee remove_a is called after it's done? 如果capitalize_d异步递归，我们如何保证remove_a在完成后被调用？ Note the setTimeout call in capitalize_d . 注意在capitalize_d调用setTimeout 。

function capitalize_d(node) {
    setTimeout(function() {

        if(node.name === "d") {
            node.name = node.name.toUpperCase();
        }

        if(node.children != null) {
            for(var i = 0; i < node.children.length; i++) {
                capitalize_d(node.children[i]);
            }
        }

    }, 1);
}

function remove_a(node) {
}

capitalize_d(data);

// Should only get called after all the d's have been capitalized.
remove_a(data);

The problem is we have processing for different branches of the tree all getting fired off at the same time, and it's impossible to tell when it's finally done processing the tree. 问题是我们对树的不同分支的处理同时被解雇了，并且无法确定它何时最终完成处理树。

How can I solve this? 我怎么解决这个问题？

Answer 1

Let me summarize what I understood of your requirements: 让我总结一下我对你的要求的理解：

you have a data tree where each single node can be altered asynchronously by a set of operations ( capitalize_d and remove_a in your example), 你有一个数据树，每个单个节点可以通过一组操作异步更改（在你的例子中是capitalize_d和remove_a ），
you want to make sure every single node has been subjected to a given operation before allowing the next one. 您希望在允许下一个节点之前确保每个节点都已经过给定的操作。

I have spent 10 years or more designing real-time embedded software, and believe me, the requirements in this area are meaner and scarrier than anything most network programmers will experience in their entire life. 我花了10年或更长时间设计实时嵌入式软件，并且相信我，这个领域的要求比大多数网络程序员在他们的整个生活中所经历的任何事情都更加苛刻和苛刻。 This makes me warn you that you seem to be headed the seriously wrong way here. 这让我警告你，你似乎在这里走错了方向。

As I can imagine it, your problem is to organize a set of individual data in some sort of meaningful structure. 我可以想象，你的问题是在某种有意义的结构中组织一组个人数据。 Some process collects random bits of information (what you call 'nodes' in your example), and at some point you want to put all those nodes into a consistent, monolithic data structure (a hierarchical tree in your example). 某些进程收集随机信息位（在您的示例中称为“节点”），并且在某些时候您希望将所有这些节点放入一致的单一数据结构（示例中的分层树）。

In other words, you have three tasks at hand: 换句话说，您手头有三个任务：

a data acquisition process that will collect nodes asynchronously 数据采集过程，将异步收集节点
a data production process that will present a consistent, refined data tree 数据生成过程，将呈现一致，精炼的数据树
a controller process that will synchronize data acquisition and production (possibly directly the user interface if the two above processes are smart enough, but don't count on it too much). 一个控制器进程，它将同步数据采集和生产（如果上述两个进程足够智能，可能直接用户界面，但不要太依赖它）。

My advice: don't try to do acquisition and production at the same time . 我的建议： 不要试图同时进行收购和生产 。

Just to give you an idea of the nightmare you're headed for: 只是为了让您了解您前往的噩梦：

depending on how the operations are triggered, there is a possibility that the tree will never be completely processed by a given operation. 根据操作的触发方式，树可能永远不会被给定的操作完全处理。 Let's say the controlling software forgets to call capitalize_d on a few nodes, remove_a will simply never get the green light 假设控制软件忘记在几个节点上调用capitalize_d ， remove_a将永远不会得到绿灯
conversely, if you fire at the tree at random, it is very likely that some nodes will get processed multiple times, unless you keep track of the operation coverage to prevent applying the same transformation twice to a given node 相反，如果您随机触发树，很可能会多次处理某些节点，除非您跟踪操作覆盖率以防止对给定节点应用相同的转换两次
if you want remove_a processing to ever start, you might have to prevent the controlling software from sending any more capitalize_d requests, or else the light might stay stuck to red forever. 如果你想让remove_a处理能够启动，你可能必须阻止控制软件发送更多的capitalize_d请求，否则灯光可能会永远停留在红色状态。 You will end up doing flow control over your requests one way or another (or worse : you will not do any, and your system will be likely to freeze to death should the operation flow wander away from the sweet spot you happened to hit by chance). 你最终会以这种或那种方式对你的请求进行流量控制（或者更糟糕的是：你不会做任何事情，如果操作流程远离你偶然遇到的最佳位置，你的系统很可能会冻死）。
if an operation alters the structure of the tree (as remove_a obviously does), you have to prevent concurrent accesses. 如果一个操作改变了树的结构（就像remove_a显然那样），你必须防止并发访问。 At the very least, you should lock the subtree starting from the node remove_a is working on, or else you would allow processing a subtree that is likely to be asynchronously altered and/or destroyed. 至少，您应该从remove_a正在处理的节点开始锁定子树，否则您将允许处理可能异步更改和/或销毁的子树。

Well it's doable. 嗯，这是可行的。 I've seen fine youg men earning big money doing variations on this theme. 我见过很好的男人赚大钱做这个主题的变化。 They usually spent a couple of evenings each week eating pizzas in front of their computers too, but hey, that's how you can tell hardboiled hackers from the crowd of quiche eaters, right?... 他们通常每周花几个晚上在他们的电脑前吃比萨饼，但是，嘿，这就是你如何告诉来自食客的人群中的硬汉黑客，对吧？...

I assume you being posting this question here means you don't really want to do this. 我假设你在这里发布这个问题意味着你真的不想这样做。 Now if you boss does, well, to quote a famous android, I can't lie to you about your chances, but... you have my sympathies. 现在，如果你的老板，嗯，引用一个着名的机器人，我不能骗你的机会，但是...你有我的同情心。

Now seriously folks.. This is the way I would tackle the problem. 现在认真的人..这是我解决问题的方法。

1) take a snapshot of your data at a given point in time 1）在给定的时间点拍摄数据的快照

you can weed out raw data using as many criteria as you can (last data acquisition too old, incorrect input, whatever allows you to build the smallest possible tree). 您可以使用尽可能多的标准清除原始数据（最后一次数据采集太旧，输入不正确，任何允许您构建最小树的数据）。

2) build the tree with the snapshot, and then apply whatever capitalize_d, remove_a and camelize_z operations sequentially on this given snapshot. 2）使用快照构建树，然后在给定的快照上依次应用任何capitalize_d，remove_a和camelize_z操作。

In parallel, the data acquisition process will continue collecting new nodes or updating existing ones, ready to take the next snapshot. 同时，数据采集过程将继续收集新节点或更新现有节点，准备拍摄下一个快照。

Besides, you can move some of your processing forward. 此外，您可以向前移动一些处理。 Obviously capitalize_d does not take any advantage of the tree structure, so you can apply capitalize_d to each node in the snapshot before the tree is even built. 显然， capitalize_d没有利用树结构，因此您可以在构建树之前将capitalize_d应用于快照中的每个节点。 You might even be able to apply some of the transformations earlier, ie on each collected sample. 您甚至可以更早地应用某些转换，即每个收集的样本。 This can save you a lot of processing time and code complexity. 这可以为您节省大量处理时间和代码复杂性。

To end with a bit of theroretical babble, 结束一点理论上的唠叨，

your approach is to consider the data tree a shared object that should support concurrent acces from the data acquisition and data production processes, 您的方法是将数据树视为应该支持来自数据采集和数据生成过程的并发访问的共享对象，
my approach is to make the data acquisition process serve (asynchronously) consistent sets of data to a data production process, which can then handle sequentially the said set of data. 我的方法是使数据采集过程为数据生成过程提供（异步）一致的数据集，然后数据生成过程可以顺序处理所述数据集。

the data production process could be triggered on demand (say when the end user clicks the "show me something" button), in which case the reactivity would be rather poor: the user would be stuck watching an hourglass or whatever Web2.0 sexy spinning wheel for the time needed to build and process the tree (lets's say 7-8 seconds). 数据生成过程可以按需触发（比如当最终用户点击“给我看东西”按钮时），在这种情况下，反应性会相当差：用户会被困在看沙漏或任何Web2.0性感旋转轮子建造和处理树所需的时间（让我们说7-8秒）。

or you could activate the data production process periodically (feed it with a new snapshot every 10 seconds, a period safely above the mean processing time of a data set). 或者您可以定期激活数据生成过程（每10秒为其提供一个新快照，安全地高于数据集的平均处理时间）。 The "show me something" button would then just present the last set of completed data. 然后，“show me something”按钮将显示最后一组完成的数据。 An immediate answer, but with data that could be 10 seconds older than the last recieved samples. 立即回答，但数据可能比最后收到的样本早10秒。

I rarely saw cases where this was not considered acceptable, especially when you produce a bunch of complex data an operator needs a few dozen seconds to digest. 我很少看到这种情况不被认为是可接受的，特别是当您生成一堆复杂数据时，操作员需要几十秒才能消化。

In theory, my approach will lose some reactivity, since the data processed will be slightly outdated, but the concurrent access approach would probably result in an even slower software (and most certainly 5-10 times bigger and buggier). 从理论上讲，我的方法会失去一些反应性，因为处理的数据会稍微过时，但并发访问方法可能会导致软件速度更慢（大多数肯定是5-10倍更大和更笨）。

Answer 2

I know this post is old, but it came up in search results, and the lone response doesn't provide a working example, so here's a modified version of something I recently did... 我知道这篇文章很老了，但它出现在搜索结果中，单独的响应没有提供一个工作示例，所以这里是我最近做过的一些修改版本......

function processTree(rootNode, onComplete) {

    // Count of outstanding requests.
    // Upon a return of any request,
    // if this count is zero, we know we're done.
    var outstandingRequests = 0;

    // A list of processed nodes,
    // which is used to handle artifacts
    // of non-tree graphs (cycles, etc).
    // Technically, since we're processing a "tree",
    // this logic isn't needed, and could be
    // completely removed.
    //
    // ... but this also gives us something to inspect
    // in the sample test code. :)
    var processedNodes = [];

    function markRequestStart() {
        outstandingRequests++;
    }

    function markRequestComplete() {
        outstandingRequests--;
        // We're done, let's execute the overall callback
        if (outstandingRequests < 1) { onComplete(processedNodes); }
    }

    function processNode(node) {
        // Kickoff request for this node
        markRequestStart();
        // (We use a regular HTTP GET request as a
        // stand-in for any asynchronous action)
        jQuery.get("/?uid="+node.uid, function(data) {
            processedNodes[node.uid] = data;
        }).fail(function() {
            console.log("Request failed!");
        }).always(function() {
            // When the request returns:
            // 1) Mark it as complete in the ref count
            // 2) Execute the overall callback if the ref count hits zero
            markRequestComplete();
        });

        // Recursively process all child nodes (kicking off requests for each)
        node.children.forEach(function (childNode) {
            // Only process nodes not already processed
            // (only happens for non-tree graphs,
            // which could include cycles or multi-parent nodes)
            if (processedNodes.indexOf(childNode.uid) < 0) {
                processNode(childNode);
            }
        });

    }

    processNode(rootNode);
}

And here's example usage using QUnit: 这是使用QUnit的示例用法：

QUnit.test( "async-example", function( assert ) {
    var done = assert.async();

    var root = {
        uid: "Root",
        children: [{
            uid: "Node A",
            children: [{
                uid: "Node A.A",
                children: []
            }]
        },{
            uid: "Node B",
            children: []
        }]
    };

    processTree(root, function(processedNodes) {
        assert.equal(Object.keys(processedNodes).length, 4);
        assert.ok(processedNodes['Root']);
        assert.ok(processedNodes['Node A']);
        assert.ok(processedNodes['Node A.A']);
        assert.ok(processedNodes['Node B']);
        done();
    });
});

Javascript：如何使用异步递归树遍历来控制流？

问题描述

2 个解决方案

解决方案1
3 2013-12-29 21:31:14

解决方案2
2 2016-05-22 20:07:37

Javascript：如何使用异步递归树遍历来控制流？

问题描述

2 个解决方案

解决方案1 3 2013-12-29 21:31:14

解决方案2 2 2016-05-22 20:07:37

解决方案1
3 2013-12-29 21:31:14

解决方案2
2 2016-05-22 20:07:37