Here is what I am trying to accomplish. I am able to scrape a web page successfully and then extract the information that I need and I have already run this on a couple of websites where the pagination links are readily available in the href attribute. My question is how do navigate to the next page when the pagination variable is dynamic:
<ul>
<li>
<a class="clickPage" href="javascript:previousPage()">1</a>
</li>
<li>
<a class="clickPage active" href="javascript:currentPage()">2</a>
</li>
<li>
<a class="clickPage" href="javascript:nextPage()">Next Page</a>
</li>
So far as code here is what I have working for other sites
var request = require('request'), // simplified HTTP request client
cheerio = require('cheerio'), // lean implementation of core jQuery
Xray = require('x-ray'), //
x = Xray(),
fs = require('fs'); // file system i/o
/*
TODO: Make this feature dynamic, to take in the URL of the page
var pageUrl;
*/
var status = 'for sale';
var counter = 0;
x('http://www.example.com/results/1', '.results', [{
id: 'div.grid@id', // extracts the value from the attribute id
title: 'div.info h2',
category: 'span.category',
price: 'p.price',
count: counter+1, // why doesnt this update? this never shows in the json
status: status // this value never shows up in the json
}])
.paginate(whatShouldThisBe)
.limit(800)
.write('products.json');
Also the value of count and status never gets shown in the JSON file that's generated. Not sure what am I doing wrong here, but would definitely appreciate all help.
Thanks!
Have you tried with .paginate('ul li:nth-child(3) a@href')
?
In this way you get the third <li>
in the <ul>
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.