简体繁体 English

Heisenbug拥有自己的无头浏览器

[英]Heisenbug with own headless browser

原文 2012-11-29 10:05:34 0 1 javascript/ c++/ webkit/ qt4/ headless-browser

I'm working on a headless browser based on WebKit (using C++/Qt4) with JavaScript support. 我正在开发基于WebKit（使用C ++ / Qt4）和JavaScript支持的无头浏览器。 The main purpose for this is being able to generate a HTML spanshot of websites heavily based on JavaScript (see Backbone.js or any other JavaScript MVC). 这样做的主要目的是能够在很大程度上基于JavaScript生成HTML网站的spanshot（参见Backbone.js或任何其他JavaScript MVC）。

I'm aware that there isn't any way for knowing when the page is completely loaded (please see this question ) and because of that, after I get the loadFinished signal (docs here ) I create a timer and start polling the DOM content (as in checking every X ms the content of the DOM) to see if there were any changes. 我知道没有任何方法可以知道页面何时被完全加载（请参阅此问题），因此，在我获得loadFinished信号（此处为 docs）后，我创建了一个计时器并开始轮询DOM内容（如检查每个X ms的DOM内容）以查看是否有任何更改。 If there werent I assume that the page was loaded and print the result. 如果没有我认为页面已加载并打印结果。 Please keep in mind that I already know this is not-near-to-perfect solution, but it's the only one I could think of. 请记住，我已经知道这不是一个接近完美的解决方案，但它是我能想到的唯一一个。 If you have any better idea please answer this question 如果您有任何更好的想法，请回答这个问题

NOTE: The timer is non-blocking, meaning that everything running inside WebKit shouldn't be affected/blocked/paused in any way. 注意：计时器是非阻塞的，这意味着WebKit中运行的所有内容都不应以任何方式受到影响/阻止/暂停。

After testing the headless browser with some pages, everything seems to work fine (or at least as expected). 在用一些页面测试无头浏览器之后，一切似乎都运行良好（或至少如预期的那样）。 But here is where the heisenbug appears. 但这里是heisenbug出现的地方。 The headless browser should be called from a PHP script, which should wait (blocking call) for some output and then print it. 应该从PHP脚本调用无头浏览器，该脚本应等待（阻塞调用）某些输出，然后打印它。

On my test machine (Apache 2.3.14, PHP 5.4.6) running the PHP script outputs the desired result, aka, the headless browser fetches the website, runs the JavaScript and prints what a user would see; 在我的测试机器（Apache 2.3.14，PHP 5.4.6）上运行PHP脚本输出所需的结果，也就是说，无头浏览器获取网站，运行JavaScript并打印用户将看到的内容; but running the same script in the production server will fetch the website, run some of the JavaScript code and print the result. 但是在生产服务器中运行相同的脚本将获取网站，运行一些 JavaScript代码并打印结果。

The source code of the headless browser and the PHP script I'm using can be found here . 我可以在这里找到无头浏览器的源代码和我正在使用的PHP脚本。

NOTE: The timer (as you can see in the source code of the headless browser) is set to 1s, but setting a bigger amount of time doesn't fix the problem 注意：计时器（如您在无头浏览器的源代码中所见）设置为1秒，但设置更长的时间并不能解决问题

NOTE 2: Catching all JavaScript errors doesn't show anything, so it's not because of a missing function, wrong args, or any other type of incorrect code. 注意2：捕获所有JavaScript错误没有显示任何内容，因此不是因为缺少函数，错误的args或任何其他类型的错误代码。

I'm testing the headless browser with 2 websites. 我正在用2个网站测试无头浏览器。 This one is working on both my test machine and in production server, while this one works only in my test machine. 这个是在我的测试机器和生产服务器上工作，而这个只适用于我的测试机器。

I'm more propone to think that this is some weird bug in the JavaScript code in the second website rather than in the code of the headless browser, as it generates a perfect HTML snapshot of the first website, but then again, this is a heisenbug so I'm not really sure what is causing all this. 我更倾向于认为这是第二个网站中JavaScript代码中的一些奇怪的错误，而不是无头浏览器的代码，因为它生成了第一个网站的完美HTML快照，但话又说回来，这是一个heisenbug所以我不确定是什么导致了这一切。

Any ideas/comments will be appreciated. 任何想法/意见将不胜感激。 Thank you 谢谢

1 个解决方案

Rather than polling for DOM changes, why not watch network requests? 而不是轮询DOM更改，为什么不观看网络请求？ This seems like a safer heuristic to use. 这似乎是一个更安全的启发式使用。 If there has been no network activity for X ms (and there are no pending requests), then assume page is fully "loaded". 如果X ms没有网络活动（并且没有待处理请求），则假设页面已完全“加载”。