简体   繁体   English

使用casperjs时如何等待页面加载?

[英]How to wait for page loading when using casperjs?

I am trying to scrape a webpage which has a form with many dropdowns and values in the form are interdependent. 我正在尝试抓取一个网页,其中包含许多下拉列表,并且表单中的值是相互依赖的。 At many point I need the code to wait till the refresh of the page complete. 在许多方面,我需要代码等待页面刷新完成。 Eg after selecting an option from the list, the code should wait till the next list is populated based on this selection. 例如,从列表中选择一个选项后,代码应该等到根据此选择填充下一个列表。 It would be really helpful if someone could give pointers because strangely my code is working only after I gave so much unnecessary logging statements which in-turn created some delay. 如果有人可以指点,那将是非常有帮助的,因为奇怪的是我的代码只有在我提供了如此多的不必要的日志记录语句之后才能工作,这反过来又造成了一些延迟。 Any suggestions to improve the code would be very helpful. 任何改进代码的建议都会非常有用。

var casper = require('casper').create({
     verbose: true,
     logLevel: 'debug',
     userAgent: 'Mozilla/5.0  poi poi poi (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22',
     pageSettings: {}
 });

 casper.start('http://www.abc.com', function () {
     console.log("casper started");
     this.fill('form[action="http://www.abc.com/forum/member.php"]', {
         quick_username: "qwe",
         quick_password: "qwe"
     }, true);
     this.capture('screen.png');
 });
 casper.thenOpen("http://www.abc.com/search/index.php").then(function () {
     this.click('input[type="checkbox"][name="firstparam"]');
     this.click('a#poi');

     casper.evaluate(function () {
         document.getElementsByName("status")[0].value = 1;
         document.getElementsByName("state")[0].value = 1078;
         changeState(); //This function is associated with the dropdown ie state 
and the page reloads at this point. Only after complete refresh the code shoud execute! How can this be achieved?
         return true;
     });
     this.echo('Inside the first thenOpen' + this.evaluate(function () {
         return document.search.action;
     }));
 });
 casper.then(function () {
     this.capture("poi.png");
     console.log('just before injecting jquery');
     casper.page.injectJs('./jquery.js');
     this.click('input[type="checkbox"][name="or"]');
     this.evaluate(function () {
         $('.boxline .filelist input:checkbox[value=18127]').attr("checked", true);
     });
     this.echo('Just before pressing the add college button' + this.evaluate(function () {
         return document.search.action;
     }));
     this.capture('collegeticked.png');
     if (this.exists('input[type="button"][name="niv"]')) {
         this.echo('button is there');
     } else {
         this.echo('button is not there');
     }
     this.echo("Going to print return value");
     this.click('input[type="button"][name="poi"]'); // This click again causes a page refresh. Code should wait at this point for completion.
     this.echo('Immediately after pressing the add college btn getPresentState()' + this.evaluate(function () {
         return getPresentState();
     }));
     this.echo('Immediately after pressing add colleg button' + this.evaluate(function () {
         return document.search.action;
     }));
     this.capture('iu.png');
 });

 casper.then(function () {
     console.log('just before form submit');
     this.click('form[name="search"] input[type="submit"]'); //Again page refresh. Wait.
     this.echo('Immediately after search btn getPresentState()' + this.evaluate(function () {
         return getPresentState();
     }));
     this.echo('Immediately after search button-action' + this.evaluate(function () {
         return document.search.action;
     }));
     this.capture("mnf.png");
 });

 casper.then(function () {
     casper.page.injectJs('./jquery.js');
     this.capture("resultspage.png");

     this.echo('Page title is: ' + this.evaluate(function () {
         return document.title;
     }), 'INFO');
     var a = casper.evaluate(function () {
           return $('tbody tr td.tdbottom:contains("tye") ').siblings().filter($('td>a').parent());
     });
     console.log("ARBABU before" + a.length);
 });

 casper.run();

I've been using the waitForSelector 'workaround' mentioned by Arun here: https://stackoverflow.com/a/22217657/1842033 我一直在使用Arun在这里提到的waitForSelector'解决方法': https ://stackoverflow.com/a/22217657/1842033

It's the best solution I've found; 这是我发现的最佳解决方案; the 'drawback' as it were is that you need to be aware of what element you're expecting to load. '缺点'就是你需要知道你期望加载什么元素。 I say drawback, personally I don't think I've encountered a situation where I've not had some kind of feedback saying that whatever I'm waiting for has happened 我说缺点,我个人认为我没有遇到过这样的情况:我没有得到某种反馈说我无论等待的是什么都发生了

this.waitForSelector("{myElement}",
    function pass () {
        test.pass("Found {myElement}");
    },
    function fail () {
        test.fail("Did not load element {myElement}");
    },
    20000 // timeout limit in milliseconds
);

Although I'd guess you could use waitForResource() or something like that if you didn't have visual feedback. 虽然我猜你可以使用waitForResource()或类似的东西,如果你没有视觉反馈。

What I've taken to doing to get around this issue, when there isn't anything specific to target and wait for in the reloaded page, is to use the following: 为了解决这个问题我做了什么,当没有任何特定的目标并在重新加载的页面中等待时,是使用以下内容:

var classname = 'reload-' + (new Date().getTime()),
    callback = function(){},
    timeout = function(){};

/// It happens when they change something...
casper.evaluate(function(classname){
  document.body.className += ' ' + classname;
}, classname);

casper.thenClick('#submit'); /// <-- will trigger a reload of the page
casper.waitWhileSelector('body.' + classname, callback, timeout);

This way I don't have to rely on a specific expected element in the next page, I've basically done the inverse. 这样我就不必依赖下一页中的特定预期元素,我基本上完成了反向操作。 I've created a specific selector to watch out for, and execution moves on once that selector fails to match. 我已经创建了一个特定的选择器来注意,一旦选择器无法匹配,执行就会继续。

For my intents and purposes it was enough to know the page had begun reloading, I didn't need to wait until the next page had fully reloaded. 对于我的意图和目的,它足以知道页面已经开始重新加载,我不需要等到下一页完全重新加载。 This is so that I could then trigger certain waitForSelector calls on elements that may have existed both before and after the reload. 这样我就可以在重新加载之前和之后都存在的元素上触发某些waitForSelector调用。 Waiting until the temporary class has been removed lets me know that anything that existed before has since been destroyed, so no fear of selecting elements prior to the reload. 等到临时课程被删除后让我知道之前存在的任何东西都已被破坏,因此不必担心在重新加载之前选择元素。

Seems there are no real solutions. 似乎没有真正的解决方案。 http://docs.casperjs.org/en/latest/modules/casper.html#waitforselector is an available workaround which may not work always. http://docs.casperjs.org/en/latest/modules/casper.html#waitforselector是一个可用的解决方法,可能无法始终有效。

I have the same experience doing the same thing as you. 我和你一样做同样的经历。 script these way in user perspective never gone well. 脚本以这种方式在用户视角下从未顺利过。 it crash in middle of nowhere and very unreliable. 它在不知名的地方崩溃,非常不可靠。 I was doing search from salesforce that also require login. 我正在从salesforce进行搜索,这也需要登录。

You need to keep your step as minimum as possible. 您需要尽可能减少步伐。 script in a cron job way. 以cron工作方式编写脚本。 don't do form fill/button click unless you are doing UI testing. 除非您正在进行UI测试,否则不要进行表单填写/按钮单击。 I would advice you to break the process into two parts 我建议你把这个过程分成两部分

// this part do search and find out the exact url of your screen capture.
// save it in a db/csv file
1 - start by POST to http://www.abc.com/forum/member.php with username password in body.
2 - POST/GET to http://www.abc.com/search/index.php with your search criteria, you look at what the website require. if they do POST, then POST.

// second part read your input
1 - login same as first part.
2 - casper forEach your input save your capture. (save the capture result in db/csv)

my script now is pure phantomjs, casper script just keep crashing for no reason. 我的脚本现在是纯粹的phantomjs,casper脚本无缘无故地继续崩溃。 even phantomjs is unreliable. 甚至幻影都不可靠。 I save the result/status on each successful search/download, whenever there is error I exit the script if not the rest of result is unpredictable(good result in chrome turn out bad in phantomjs). 我保存每次成功搜索/下载的结果/状态,只要有错误我退出脚本,如果不是其他结果是不可预测的(好的结果导致chrome在phantomjs中变坏)。

I found this question when searching for solution to a problem where click() or fill() action reloads exactly the same data in a child iframe. 我在搜索问题的解决方案时发现了这个问题,其中click()或fill()操作在子iframe中重新加载完全相同的数据。 Here is my improvement to Pebbl answer: 以下是我对Pebbl的回答:

casper.clickAndUnload = function (click_selector, unload_selector, callback, timeout) {
    var classname = 'reload-' + (new Date().getTime());
    this.evaluate(function (unload_selector, classname) {
        $(unload_selector).addClass(classname);
    }, unload_selector, classname);

    this.thenClick(click_selector);
    this.waitWhileSelector(unload_selector + '.' + classname, callback, timeout);
};

casper.fillAndUnload = function (form_selector, data, unload_selector, callback, timeout) {
    var classname = 'reload-' + (new Date().getTime());
    this.evaluate(function (unload_selector, classname) {
        $(unload_selector).addClass(classname);
    }, unload_selector, classname);
    this.fill(form_selector, data, true);
    this.waitWhileSelector(unload_selector + '.' + classname, callback, timeout);
};

This solution assumes that page uses jQuery. 此解决方案假定该页面使用jQuery。 It should not be hard to modify it for pages that don't. 对于没有的页面,修改它应该不难。 unload_selector is an element that is expected to be reloaded after click or form submission. unload_selector是一个在单击或表单提交后需要重新加载的元素。

Since Casperjs is written for developers, it's expected one knows what state the page loaded should be in, and what elements should be available to define a page-loaded state. 由于Casperjs是为开发人员编写的,因此可以预期知道页面加载的状态,以及应该有哪些元素来定义页面加载状态。

One option is to check for the presence of, for example, a javascript resource that is loaded at the end of the page. 一种选择是检查是否存在例如在页面末尾加载的javascript资源。

When running any type of test, results must be reproducable each time and therefore idempotency is essential. 在运行任何类型的测试时,结果必须每次都可重现,因此幂等性是必不可少的。 For this to happen, the tester must be able to control the environment enough to make this happen. 为此,测试人员必须能够足以控制环境以实现这一目标。

Just evaluate document.readyState to be complete or interactive . 只需将document.readyState评估为completeinteractive Then it's loaded. 然后它被加载。

This is an implementation with a while , but maybe can be done with interval... 这是一个实现while ,但也许可以用时间间隔完成...

this.then(function () {
 while(this.evaluate(function () { return document.readyState != 'complete' && document.readyState != 'interactive'; })) {}
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM