简体   繁体   English

限制并发操作nodejs

[英]limit concurrent operations nodejs

This is a web scraping code written in node js. 这是用节点js编写的Web抓取代码。
Will this code always keep 5 concurrent request when queue has enough urls? 当队列具有足够的URL时,此代码是否将始终保留5个并发请求?
Why the console shows otherwise? 为什么控制台显示其他内容?

var request = require("request");
var cheerio = require("cheerio");
var fs = require('fs');

var concurrent_requests = 0;
var queue = [];
var baseUrl = "https://angularjs.org/";

function makeApiCall(url){
    if(url) {
        queue.unshift(url);
    }
    if(concurrent_requests<5) {
        var nextUrl = queue.pop();
        if(nextUrl) {
            concurrent_requests++;
            request(nextUrl, function (error, response, body) {
                var invalidUrl;
                concurrent_requests--;
                if(body) {
                    var $ = cheerio.load(body);
                    var anchors = $("a");
                    var data = "";
                    for (var i = 0; i < anchors.length; i++) {
                        url = $(anchors[i]).attr("href");
                        if(!url || url === "#" || url === "javascript:void(0)"){
                            invalidUrl = true;
                        }
                        else{
                             invalidUrl = false;
                        }

                        if (!invalidUrl) {
                            makeApiCall(url);
                            data += url + ", " + nextUrl + "\n";
                        }
                    }
                    //console.log(data);
                    fs.appendFile('urls.csv',data, function (err) {
                        if (err) throw err;
                    });
                }
                else{
                    makeApiCall();
                }
            });
        }
    }
     console.log(concurrent_requests);

}


makeApiCall(baseUrl);

Becoz, you have condition that states not to request more than 5 with an if statement. Becoz,您有一个条件,要求使用if语句要求不要超过5个。

if(concurrent_requests<5) { if(concurrent_requests <5){

This solution is not scalable as will go over the stack after certain recursive calls. 该解决方案不可扩展,因为在某些递归调用之后将遍历整个堆栈。

Hope it helps. 希望能帮助到你。

You are using if condition to check if the count of concurrent requests are less then five or not. 您正在使用if条件检查并发请求数是否少于五个。 But remember it is if statement, not loop. 但是请记住,它是if语句,而不是循环。 That means it will be called only once. 这意味着它将仅被调用一次。

You are making a recursive call to your function makeApiCall inside the callback of the request. 您正在请求的回调内对函数makeApiCall进行递归调用。 The callback of the request only runs when the request is fulfilled. 请求的回调仅在满足请求时运行。

With above two points in mind, in your if condition you check if concurrent_requests<5 then you call request method, and your program goes ideal. 考虑到以上两点,在您的if条件下,您检查concurrent_requests<5请求数是否concurrent_requests<5然后调用请求方法,程序将变得理想。 After sometime when the request id fulfilled, the callback of request runs, which after some logic calls the makeApiCall again. 在请求ID满足后的某个时间过后,请求的回调将运行,在某些逻辑之后,该回调将再次调用makeApiCall So in every call you are calling request only once and then wait for that to resolve and then only your program proceed for next request. 因此,在每个调用中,您只调用一次请求,然后等待该请求解决,然后只有程序继续进行下一个请求。

If you want concurrent request then use a loop like this 如果您想要并发请求,则使用这样的循环

function makeApiCall(url){
    if(url) {
        queue.unshift(url);
    }
    // Use a loop here
    while(concurrent_requests<5) {
        var nextUrl = queue.pop();
        if(nextUrl) {
            concurrent_requests++;
            request(nextUrl, function (error, response, body) {
                var invalidUrl;
                concurrent_requests--;
                if(body) {
                        ...
                        if (!invalidUrl) {
                            makeApiCall(url);
                            data += url + ", " + nextUrl + "\n";
                        }
                    }
                    ...
                }
                else{
                    makeApiCall();
                }
            });
        }
        else{
           // Remember to break out of loop when queue is empty to avoid infinite loop.
           break;
        }
    }
     console.log(concurrent_requests);

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM