简体   繁体   English

通过在Node.js中使用Cheerio加载特定的HTML?

[英]Load specific HTML by using cheerio in nodejs?

I need to get all <a> tag url from given webpage. 我需要从给定的网页中获取所有<a>标记网址。 And also I need to avoid <a> tag between header and footer tags. 而且我还需要避免在页眉和页脚标签之间使用<a>标签。 I am loading body tag html but without header tag. 我正在加载正文标签html,但没有标题标签。 Here is my code but it doesn't work. 这是我的代码,但是不起作用。

var $ = cheerio.load(html);
$ = cheerio.load($('body').not('header'));

var links = $("a']");
links.each(function() {
    console.log($(this).attr('href'));
});

If above code is wrong please suggest how to do that? 如果上面的代码是错误的,请建议该怎么做?

Cheerio works just like jQuery. Cheerio的工作方式类似于jQuery。

var $ = cheerio.load(html);
var links = $('body').not('header').find('a');
// $('body:not(header) a') may also work.

links.each(function() {
    console.log(this.href);
});

I think the error was because you weren't loading the HTML in your second load. 我认为错误是因为您没有在第二次加载中加载HTML。 You were loading the body object. 您正在加载主体对象。 You should be able to do it this way: 您应该可以这样做:

var $ = cheerio.load(html);
$ = cheerio.load($('body').html());

$('header').remove();

console.log($.html());

I did like this now its working fine ... Can any one tell me is this right way do this ?... 我现在确实喜欢它,它的工作还不错。有人可以告诉我这样做是对的吗?

var $ = cheerio.load(body);
var t = $('body');
t.children('header').remove();
t.children('footer').remove();
var t = $.html(t);
var $ = cheerio.load(t);
var links = $("a");
links.each(function() {
    console.log($(this).attr('href'));
});

Thanks, 谢谢,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM