[英]Scraping URLs from a web page with Node.js
I'm trying to scrape all URLs from a website and put them into an array. 我正在尝试从网站上抓取所有URL,并将它们放入数组中。 I have a question about an array index. 我对数组索引有疑问。 If I add an index number like 2 into array[2], the command line replies with "undefined". 如果我在数组[2]中添加一个类似2的索引号,则命令行将以“ undefined”答复。 If I remove the index and print the whole array, it prints all the URLs line by line. 如果删除索引并打印整个数组,它将逐行打印所有URL。 I want each URL to be its own index like: 我希望每个URL都是自己的索引,例如:
Can anyone point me in the right direction? 谁能指出我正确的方向? Thank you. 谢谢。
var request = require('request');
var cheerio = require('cheerio');
var url = 'http://www.hobo-web.co.uk/';
request(url, function(err, resp, body){
$ = cheerio.load(body);
links = $('a'); //use your CSS selector here
$(links).each(function(i, link){
var array = $(link).attr('href');
console.log(array[2]);
});
});``
You need to initially create the array as a variable accessible within the .each
loop, then keep pushing the href values to it. 您首先需要将数组创建为.each
循环中可访问的变量,然后继续将href值推入该数组。
var request = require('request');
var cheerio = require('cheerio');
var url = 'http://www.hobo-web.co.uk/';
var array = [];
request(url, function(err, resp, body){
$ = cheerio.load(body);
links = $('a');
$(links).each(function(i, link){
var href = $(link).attr('href');
array.push(href);
});
});
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.