[英]Parsing multiple pages of website and count total items
My script simply gathers the number of reports on a page, then goes to the next page and does the same. 我的脚本仅收集页面上的报告数,然后转到下一页并执行相同的操作。 The goal is to get the total number of reports across multiple pages.
目的是获得跨多个页面的报告总数。
UPDATED 更新
var casper = require('casper').create({
clientScripts: ["./lib/jquery-2.1.3.min.js"],
// verbose: true,
logLevel: "debug"
});
casper.on('remote.message', function(msg) {
this.echo('LOG: ' + msg);
});
casper.on('page.error', function (msg, trace) {
this.echo( 'Error: ' + msg, 'ERROR' );
});
var reportCount, newReportCount, reportPages;
casper.start("reports.html", function() {
reportPages = this.evaluate(function() {
return $('#table2 tbody tr td').children('a').length -1;
});
//first page of reports
reportCount = this.evaluate(function() {
return $('#table1 tbody').first().children('tr').length;
});
this.echo('initial count: ' + reportCount);
this.echo('pages: ' + reportPages);
//check if more than 1 page and add report count
if (reportPages > 1) {
newReportCount = this.thenOpen('reports2.html', function(){
var newCount = this.evaluate(function(count) {
add = count + $('#table1 tbody').first().children('tr').length;
// console.log('new count inside: ' + add);
return add;
}, reportCount);
console.log(newCount); //this shows correct new value 32
});
console.log(newReportCount); //this shows [object Casper]
neoReportCount = this.thenOpen('reports3.html', function(count){
console.log(newReportCount); //this shows [object Casper]
//do the same count
}, newReportCount);
}
casper.run();
Here is the output in console 这是控制台中的输出
Pages: 3 First count: 15 [object Casper], currently at file:///**/reports.html 32 [object Casper], currently at file:///**/reports3.html
Yes, it is possible, but you use casper.thenOpenAndEvaluate()
which has the word then
in it. 是的,这是可能的,但是你用
casper.thenOpenAndEvaluate()
其中有字then
在里面。 It means that this function is asynchronous and it returns the casper
object to enable a builder/promise pattern. 这意味着该函数是异步的,它返回
casper
对象以启用构建器/承诺模式。 So you cannot return anything from a function like this. 因此,您无法从此类函数返回任何内容。 Since it is asynchronous, it will be executed after the current step ends, which is after
console.log(newCount);
由于它是异步的,因此它将在当前步骤结束之后
console.log(newCount);
之后console.log(newCount);
. 。
You would need to split the function, for example like this: 您将需要拆分函数,例如:
//check if more than 1 page and add report count
if (reportPages > 1) {
var newCount;
this.thenOpen('reports2.html', function(count){
newCount = this.evaluate(function(count){
add = count + $('#table1 tbody').first().children('tr').length;
console.log('new count inside: ' + add);
return add;
}, reportCount);
console.log(newCount);
}).thenOpen('reports3.html', function(count){
newCount += this.evaluate(function(count){
add = count + $('#table1 tbody').first().children('tr').length;
console.log('new count inside: ' + add);
return add;
}, reportCount);
console.log(newCount);
}).then(function(){
console.log(newCount);
});
}
It seems like you want to loop over multiple pages. 似乎您想循环浏览多个页面。 This is usually done recursively, because CasperJS is asynchronous and you don't know beforehand how many pages you need to open.
这通常是递归完成的,因为CasperJS是异步的,并且您事先不知道需要打开多少页。 I suggest you look at this question for some examples: CasperJS loop or iterate through multiple web pages?
我建议您看一些示例的问题: CasperJS循环还是遍历多个网页?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.