简体   繁体   中英

dynamic links in nodejs/ cheerio/ x-ray

Here is what I am trying to accomplish. I am able to scrape a web page successfully and then extract the information that I need and I have already run this on a couple of websites where the pagination links are readily available in the href attribute. My question is how do navigate to the next page when the pagination variable is dynamic:

<ul>
    <li>
        <a class="clickPage" href="javascript:previousPage()">1</a>
    </li>
    <li>
        <a class="clickPage active" href="javascript:currentPage()">2</a>
    </li>
    <li>
        <a class="clickPage" href="javascript:nextPage()">Next Page</a>
    </li>

So far as code here is what I have working for other sites

var request = require('request'),       // simplified HTTP request client
    cheerio = require('cheerio'),       // lean implementation of core jQuery
    Xray = require('x-ray'),            // 
    x = Xray(),
    fs = require('fs');                 // file system i/o

/*
    TODO: Make this feature dynamic, to take in the URL of the page
    var pageUrl;
*/

var status = 'for sale';
var counter = 0;

x('http://www.example.com/results/1', '.results', [{
    id: 'div.grid@id',    // extracts the value from the attribute id
    title: 'div.info h2',
    category: 'span.category',
    price: 'p.price',
    count: counter+1,    // why doesnt this update? this never shows in the json
    status: status       // this value never shows up in the json
}])
  .paginate(whatShouldThisBe)
  .limit(800)
  .write('products.json');

Also the value of count and status never gets shown in the JSON file that's generated. Not sure what am I doing wrong here, but would definitely appreciate all help.

Thanks!

Have you tried with .paginate('ul li:nth-child(3) a@href') ?

In this way you get the third <li> in the <ul> .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM