简体   繁体   English

使用 Cheerio 的 Nodejs Webscraping 功能在完成前返回

[英]Nodejs Webscraping function using Cheerio returns before finished

I'm currently working on a simple web scraping nodejs program.我目前正在开发一个简单的网络抓取 nodejs 程序。 It is based on cheerio and I get items from a website and extract some information from there.它基于cheerio,我从网站获取项目并从中提取一些信息。

As far as I understand it all functions I call inside the foreach loop are sync so they should execute from top to bottom.据我了解,我在 foreach 循环中调用的所有函数都是同步的,因此它们应该从上到下执行。 And because the foreach loop is also only a normal loop, which executes sync in js, the function should return my finished array.并且因为 foreach 循环也只是一个普通循环,它在 js 中执行同步,该函数应该返回我完成的数组。 But instead it is getting undefined and when I log it inside directly to console it works(?).但相反,它变得未定义,当我将它直接登录到控制台时,它可以工作(?)。

function getIntensiv(){
    var intensivregister = [];
    request.post({url: 'SOMEURL', form: {SOMEFORM}}, function(err,res,body){
        var $ = cheerio.load(body);
        $('#dataList').children('tbody').children('tr').each(function(i, elem){
            var name = $(elem).children('td').first().text().trim().split("\n")[0].trim();                        
            var zipcity = $(elem).children('td').first().children('small').last().text();                         
            var streetnr = $(elem).children('td').first().children('br').last().prev().text();                    
            intensivregister.push({'name': name, 'zipcity': zipcity, 'streetnr': streetnr});
        });
        console.log(intensivregister);  //works and prints the finished array
        return intensivregister;        //returns undefined before function finished
    });
}

I would appreciate it if you could explain me where my mistake is and help me fix it.如果您能向我解释我的错误在哪里并帮助我解决它,我将不胜感激。

function getIntensiv(){
const cheerio = require('cheerio')
const request = require('request')

var intensivregister = [];
request.get({url: 'https://www.w3schools.com/html/html_tables.asp'}, function(err,res,body){


var $ = cheerio.load(body);

    $('#customers').children('tbody').children('tr').each(function(i, elem){
        var name = $(elem).children('td').first().text().trim().split("\n")[0].trim();                        
        var zipcity = $(elem).children('td').first().children('small').last().text();                         
        var streetnr = $(elem).children('td').first().children('br').last().prev().text();                    
        intensivregister.push({'name': name, 'zipcity': zipcity, 'streetnr': streetnr});
    });
    console.log(intensivregister);  //works and prints the finished array
    return null;        //returns undefined before function finished
});
return null;   //***<---This is returning and not the above return. If no return statement is written then undefined is passed.***
};

var retrunVal = getIntensiv()
console.log(retrunVal);

Please find the highlighted comment请找到突出显示的评论

Ok I figured out that my idea of javascript was not how you should use it.好吧,我发现我对 javascript 的想法不是你应该如何使用它。 I worked around my problem with getting rid of the idea of returning values from functions (which comes mainly from my experiences from async programming) and instead using callback parameters which I give to my function and call at the end of my request.我解决了我的问题,摆脱了从函数返回值的想法(这主要来自我的异步编程经验),而是使用我提供给我的函数并在我的请求结束时调用的回调参数。

function getIntensiv(callback){
    var intensivregister = [];
    request.post(...);
    **callback(intensivregister);**
}

What also is working (and I think a better solution) is working with promises eg with request-promise and calling the callback in the finally call.还有什么工作(我认为更好的解决方案)是使用承诺,例如使用 request-promise 并在finally调用中调用回调。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM