使用Node.js从网页中抓取URL

Question

I'm trying to scrape all URLs from a website and put them into an array. 我正在尝试从网站上抓取所有URL，并将它们放入数组中。 I have a question about an array index. 我对数组索引有疑问。 If I add an index number like 2 into array[2], the command line replies with "undefined". 如果我在数组[2]中添加一个类似2的索引号，则命令行将以“ undefined”答复。 If I remove the index and print the whole array, it prints all the URLs line by line. 如果删除索引并打印整个数组，它将逐行打印所有URL。 I want each URL to be its own index like: 我希望每个URL都是自己的索引，例如：

array[0] = First URL found array [0] =找到第一个URL
array[1] = Second URL found array [1] =找到第二个URL
array[2] = Thirs URL found etc. array [2] =找到了第三条网址，等等。

Can anyone point me in the right direction? 谁能指出我正确的方向？ Thank you. 谢谢。

  var request = require('request');
    var cheerio = require('cheerio');

   var url = 'http://www.hobo-web.co.uk/';

    request(url, function(err, resp, body){
      $ = cheerio.load(body);
      links = $('a'); //use your CSS selector here
      $(links).each(function(i, link){
        var array = $(link).attr('href');
        console.log(array[2]);

      });
    });``

Answer 1

You need to initially create the array as a variable accessible within the .each loop, then keep pushing the href values to it. 您首先需要将数组创建为.each循环中可访问的变量，然后继续将href值推入该数组。

var request = require('request');
var cheerio = require('cheerio');

var url = 'http://www.hobo-web.co.uk/';

var array = [];

request(url, function(err, resp, body){
  $ = cheerio.load(body);
  links = $('a');
  $(links).each(function(i, link){
    var href = $(link).attr('href');
    array.push(href);
  });
});

使用Node.js从网页中抓取URL

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-03-22 01:00:29

使用Node.js从网页中抓取URL

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-03-22 01:00:29

解决方案1
2 已采纳 2017-03-22 01:00:29