[英]Nodejs Webscraping function using Cheerio returns before finished
I'm currently working on a simple web scraping nodejs program.我目前正在开发一个简单的网络抓取 nodejs 程序。 It is based on cheerio and I get items from a website and extract some information from there.
它基于cheerio,我从网站获取项目并从中提取一些信息。
As far as I understand it all functions I call inside the foreach loop are sync so they should execute from top to bottom.据我了解,我在 foreach 循环中调用的所有函数都是同步的,因此它们应该从上到下执行。 And because the foreach loop is also only a normal loop, which executes sync in js, the function should return my finished array.
并且因为 foreach 循环也只是一个普通循环,它在 js 中执行同步,该函数应该返回我完成的数组。 But instead it is getting undefined and when I log it inside directly to console it works(?).
但相反,它变得未定义,当我将它直接登录到控制台时,它可以工作(?)。
function getIntensiv(){
var intensivregister = [];
request.post({url: 'SOMEURL', form: {SOMEFORM}}, function(err,res,body){
var $ = cheerio.load(body);
$('#dataList').children('tbody').children('tr').each(function(i, elem){
var name = $(elem).children('td').first().text().trim().split("\n")[0].trim();
var zipcity = $(elem).children('td').first().children('small').last().text();
var streetnr = $(elem).children('td').first().children('br').last().prev().text();
intensivregister.push({'name': name, 'zipcity': zipcity, 'streetnr': streetnr});
});
console.log(intensivregister); //works and prints the finished array
return intensivregister; //returns undefined before function finished
});
}
I would appreciate it if you could explain me where my mistake is and help me fix it.如果您能向我解释我的错误在哪里并帮助我解决它,我将不胜感激。
function getIntensiv(){
const cheerio = require('cheerio')
const request = require('request')
var intensivregister = [];
request.get({url: 'https://www.w3schools.com/html/html_tables.asp'}, function(err,res,body){
var $ = cheerio.load(body);
$('#customers').children('tbody').children('tr').each(function(i, elem){
var name = $(elem).children('td').first().text().trim().split("\n")[0].trim();
var zipcity = $(elem).children('td').first().children('small').last().text();
var streetnr = $(elem).children('td').first().children('br').last().prev().text();
intensivregister.push({'name': name, 'zipcity': zipcity, 'streetnr': streetnr});
});
console.log(intensivregister); //works and prints the finished array
return null; //returns undefined before function finished
});
return null; //***<---This is returning and not the above return. If no return statement is written then undefined is passed.***
};
var retrunVal = getIntensiv()
console.log(retrunVal);
Please find the highlighted comment请找到突出显示的评论
Ok I figured out that my idea of javascript was not how you should use it.好吧,我发现我对 javascript 的想法不是你应该如何使用它。 I worked around my problem with getting rid of the idea of returning values from functions (which comes mainly from my experiences from async programming) and instead using callback parameters which I give to my function and call at the end of my request.
我解决了我的问题,摆脱了从函数返回值的想法(这主要来自我的异步编程经验),而是使用我提供给我的函数并在我的请求结束时调用的回调参数。
function getIntensiv(callback){
var intensivregister = [];
request.post(...);
**callback(intensivregister);**
}
What also is working (and I think a better solution) is working with promises eg with request-promise and calling the callback in the finally call.还有什么工作(我认为更好的解决方案)是使用承诺,例如使用 request-promise 并在finally调用中调用回调。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.