简体   繁体   中英

Heisenbug with own headless browser

I'm working on a headless browser based on WebKit (using C++/Qt4) with JavaScript support. The main purpose for this is being able to generate a HTML spanshot of websites heavily based on JavaScript (see Backbone.js or any other JavaScript MVC).

I'm aware that there isn't any way for knowing when the page is completely loaded (please see this question ) and because of that, after I get the loadFinished signal (docs here ) I create a timer and start polling the DOM content (as in checking every X ms the content of the DOM) to see if there were any changes. If there werent I assume that the page was loaded and print the result. Please keep in mind that I already know this is not-near-to-perfect solution, but it's the only one I could think of. If you have any better idea please answer this question

NOTE: The timer is non-blocking, meaning that everything running inside WebKit shouldn't be affected/blocked/paused in any way.

After testing the headless browser with some pages, everything seems to work fine (or at least as expected). But here is where the heisenbug appears. The headless browser should be called from a PHP script, which should wait (blocking call) for some output and then print it.

On my test machine (Apache 2.3.14, PHP 5.4.6) running the PHP script outputs the desired result, aka, the headless browser fetches the website, runs the JavaScript and prints what a user would see; but running the same script in the production server will fetch the website, run some of the JavaScript code and print the result.

The source code of the headless browser and the PHP script I'm using can be found here .

NOTE: The timer (as you can see in the source code of the headless browser) is set to 1s, but setting a bigger amount of time doesn't fix the problem

NOTE 2: Catching all JavaScript errors doesn't show anything, so it's not because of a missing function, wrong args, or any other type of incorrect code.

I'm testing the headless browser with 2 websites. This one is working on both my test machine and in production server, while this one works only in my test machine.

I'm more propone to think that this is some weird bug in the JavaScript code in the second website rather than in the code of the headless browser, as it generates a perfect HTML snapshot of the first website, but then again, this is a heisenbug so I'm not really sure what is causing all this.

Any ideas/comments will be appreciated. Thank you

Rather than polling for DOM changes, why not watch network requests? This seems like a safer heuristic to use. If there has been no network activity for X ms (and there are no pending requests), then assume page is fully "loaded".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM