迭代node.js请求函数

Question

这个问题是关于node.js中的搜寻器的 。 将提供一个start_url ，供他在其中搜寻URL并将其“推送”到。 json -file（ output.json ）。 目前，他仅使用start_url运行请求功能，并将收集的URL保存在output.json 。 我希望他通过将start_url替换为第一个收集的URL来使用保存的URL，然后再次收集链接...依此类推...

var fs = require('fs');
var request = require('request');
var cheerio = require('cheerio');

var start_url = ["http://stackoverflow.com/"]

var req = function(url){
    request(url, function(error, response, html){
        var $ = cheerio.load(html);

        var data = [];

        $("a").each(function() {
            var link = $(this);
                var exurls = {exurl: new Array(link.attr("href"))}

                data.push(exurls);

                // Queue "exurls" for "start_url" and call the same function with the new URL (endless loop)
                // save to "output.json" from time to time, so you can stop it anytime
        });

        fs.writeFile("output.json", JSON.stringify(data, null, 4), function(err){
            if(err){
                console.log(err);
            } else {
                console.log("File successfully written!");
            }
        });
    });
}
for (var i = 0; i < start_url.length; i++){
    req(start_url[i]);
}

Answer 1

因此，您可以做的是递归调用函数。 下面的示例应该工作：

 var fs = require('fs'); var request = require('request'); var cheerio = require('cheerio'); var start_url = ["http://stackoverflow.com/"] var req = function(url){ var count = 0; request(url, function(error, response, html){ var $ = cheerio.load(html); $("a").each(function() { var link = $(this); var exurls = {exurl: new Array(link.attr("href"))} start_url.push(exurls); // Queue "exurls" for "start_url" and call the same function with the new URL (endless loop) // save to "output.json" from time to time, so you can stop it anytime }); try { fs.writeFileSync("output.json"); console.log("File successfully written!"); }catch(err){ console.log(err); } ++count; if(start_url.length > count) { req(start_url[count]); } }); } return req(start_url[0]);

问题是您每次都在完全重写文件。 如果这种情况持续了一段时间，则您将耗尽内存。 另一种选择是创建写流

 var fs = require('fs'); var request = require('request'); var cheerio = require('cheerio'); var start_url = ["http://stackoverflow.com/"] var wstream = fs.createWriteStream("output.json"); var req = function(url){ request(url, function(error, response, html){ var $ = cheerio.load(html); $("a").each(function() { var link = $(this); var exurls = {exurl: new Array(link.attr("href"))} start_url.push(exurls); // Queue "exurls" for "start_url" and call the same function with the new URL (endless loop) // save to "output.json" from time to time, so you can stop it anytime wstream.write('"'+ exurls + '",'); }); start_url.shift(); if(start_url.length > 0) { return req(start_url[0]); } wstream.end(); }); } req(start_url[0]);

编辑：切换到基本队列，以便解决内存问题

迭代node.js请求函数

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-04-14 14:44:49

迭代node.js请求函数

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-04-14 14:44:49

解决方案1
2 已采纳 2016-04-14 14:44:49