简体   繁体   English

使用npm cheerio和and请求动态网页抓取

[英]Dynamic web page scraping with npm cheerio and and request

I'm trying to scrape data from a site which has a base url and then dynamic routes. 我正在尝试从具有基本URL以及动态路由的网站上抓取数据。 This particular site simply uses numbers, so I have this code to get the data: 这个特定的站点仅使用数字,因此我有以下代码来获取数据:

 for (var i = 1; i <= total; i++) { var temp = base_url + i; var result = ""; request(temp, function(error, response, body) { var $ = cheerio.load(body); var address_string = 'http://maps.google.com/?q=' + $('title').text(); //firebase database.ref('events/' + i).set({ "address": address_string }); }); } 

However, the above code doesn't work, and doesn't add anything to the database. 但是,以上代码不起作用,也不会向数据库添加任何内容。 Does anyone know what's wrong? 有人知道怎么了吗?

I'm not sure about the reason, but one thing that will behave strangely in the code you wrote is that the variable i is not bound to the callback scope of the request, and the for loop will finish before any callback is called. 我不确定原因,但在您编写的代码中会有奇怪的表现的一件事是变量i未绑定到请求的回调范围,并且for循环将在调用任何回调之前完成。

If this is the problem, there should only be one db entry for i === total. 如果这是问题所在,则i === total只能有一个db条目。

This can be solved by doing an Array.forEach instead. 这可以通过执行Array.forEach来解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM