简体   繁体   English

从HTML元素抓取链接-Casperjs

[英]Scraping links from html elements - casperjs

I am currently trying to scrape links and thumbnails from this SITE with the help of casperjs. 我目前正在尝试在casperjs的帮助下从此站点刮取链接和缩略图。 I was able to easily figure out the html structure(shown below). 我能够轻松找出html结构(如下所示)。 I am trying to extract from all a tags the link found in the href attribute. 我试图从所有提取a标签中找到的链接href属性。 I run my script but I get an error for video_links . 我运行我的脚本,但出现video_links错误。 How could I go about scraping all links and thumbnails and output in an array? 我该如何抓取所有链接和缩略图并以数组形式输出? Error 错误

TypeError: 'undefined' is not an object (evaluating 'video_links.length')

Script 脚本

var casper = require('casper').create({}),video_links,video_thumbnails;

//Functions
function getLinks() {
    var element = document.querySelectorAll('.cne-episode-block a');
    return Array.prototype.map.call(element, function(e) {
        return e.getAttribute('href');
    });
}
casper.start('http://video.wired.com/');

casper.then(function() {
    video_links = this.evaluate(getLinks);
});

casper.run( this.echo(video_links.length + ' links found.') ); 

HTML HTML

<div class="cne-thumb-grid-container cne-context-container">
    <div class="cne-thumb cne-episode-block " data-videoid="551dc13461646d11aa020000">
        <div class="cne-thumb-image cne-rollover" data-powertiptarget="551dc13461646d11aa020000">
            <a class="cne-thumbnail cne-zoom-effect js-ajax-video-load" href="/watch/angry-nerd-will-netflix-s-daredevil-fly-or-flop" data-video-series="Angry Nerd" data-video-series-id="518d55c268f9dac897000003" data-video-id="551dc13461646d11aa020000" data-video-categories="[" Movies \u0026 TV "]">
                <img class="cne-video-thumb" src="http://dwgyu36up6iuz.cloudfront.net/heru80fdn/image/upload/c_fill,d_placeholder_thescene.jpg,fl_progressive,g_face,h_151,q_80,w_270/v1428076783/wired_angry-nerd-will-netflix-s-daredevil-fly-or-flop.jpg" alt="Will Netflix’s Daredevil Fly or Flop?">
                <div class="cne-thumbnail-play">Play</div>
            </a>
        </div>
        <div class="cne-thumb-title the-thumb-title">
            <a class="js-ajax-video-load" href="/watch/angry-nerd-will-netflix-s-daredevil-fly-or-flop" data-video-id="551dc13461646d11aa020000">Will Netflix’s Daredevil Fly or Flop?</a>
            <div class="cne-thumb-subtitle">
                <a href="/series/angry-nerd">Angry Nerd</a>
            </div>
        </div>
        <div id="551dc13461646d11aa020000" class="cne-thumb-rollover">
            <div class="cne-thumb-rollover-box">
                <span class="cne-rollover-category"> Movies & TV </span>
                <span class="cne-rollover-name"> Will Netflix’s Daredevil Fly or Flop? </span>
                <span class="cne-rollover-description"> If Netflix’s new Daredevil series is anything like Ben Affleck’s Daredevil film, we’re all in trouble. Angry Nerd explains what the latest incarnation needs to get right to make sure the man without fear doesn’t turn into a total flop. </span>
            </div>
        </div>
    </div>
</div>

If the selectors are on the same level, you will only need one of them. 如果选择器处于同一级别,则只需其中之一。 So just use either cne-thumb or cne-episode-block in your querySelectorAll not both. 因此,只在您的querySelectorAll中同时使用cne-thumbcne-episode-block

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM