简体   繁体   English

HTMLUnit不等待Javascript

[英]HTMLUnit doesn't wait for Javascript

I have a GWT based page that I would like to create an HTML snapshot for it using HtmlUnit. 我有一个基于GWT的页面,我想使用HtmlUnit为它创建一个HTML快照。 The page loads using Ajax/JavaScript information on a product, so for about 1 second there is a Loading... message and then the content appears. 页面使用产品上的Ajax / JavaScript信息加载,因此大约1秒钟就会出现Loading ...消息,然后会显示内容。

The problem is that HtmlUnit doesn't seem to capture the information and all I'm getting is the "Loading..." span. 问题是HtmlUnit似乎没有捕获信息,我得到的只是“Loading ...”范围。

Below is an experimental code with HtmlUnit where I try to give it enough time to wait for the loading of the data but it doesn't seem to change anything and I am still unable to capture the data loaded by the GWT javascript. 下面是一个带有HtmlUnit的实验代码,我试着给它足够的时间等待加载数据,但它似乎没有改变任何东西,我仍然无法捕获GWT javascript加载的数据。

        WebClient webClient = new WebClient();
        webClient.setJavaScriptEnabled(true);
        webClient.setThrowExceptionOnScriptError(false);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController()); 

        WebRequest request = new WebRequest(new URL("<my_url>"));
        HtmlPage page = webClient.getPage(request);

        int i = webClient.waitForBackgroundJavaScript(1000);

        while (i > 0)
        {
            i = webClient.waitForBackgroundJavaScript(1000);

            if (i == 0)
            {
                break;
            }
            synchronized (page) 
            {
                System.out.println("wait");
                page.wait(500);
            }
        }

        webClient.getAjaxController().processSynchron(page, request, false);

        System.out.println(page.asXml());

Any ideas...? 有任何想法吗...?

Thank you for responding. 谢谢你的回复。 I actually should have reported this sooner that I have found the solution myself. 实际上我应该早点报告这个问题,我自己找到了解决方案。 Apparently when initialising WebClient with FF: 显然在使用FF初始化WebClient时:

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);

It seem to be working. 它似乎工作。 When initialising WebClient with the default constructor it uses IE7 by default and I guess FF has better support for Ajax and is the recommended emulator to use. 当使用默认构造函数初始化WebClient时,它默认使用IE7,我猜FF对Ajax有更好的支持,是推荐使用的模拟器。

I believe by default NicelyResynchronizingAjaxController will only resynchronize AJAX calls that were caused by a user action, by tracking which thread it originated from. 我相信默认情况下, NicelyResynchronizingAjaxController只会通过跟踪源自哪个线程来重新同步由用户操作引起的AJAX调用。 Perhaps the GWT generated JavaScript is being called by some other thread which NicelyResynchronizingAjaxController does not want to wait for. 也许GWT生成的JavaScript正被NicelyResynchronizingAjaxController不想等待的其他一些线程调用。

Try declaring your own AjaxController to synchronize with everything regardless of originating thread: 尝试声明自己的AjaxController与所有内容同步,无论原始线程如何:

webClient.setAjaxController(new AjaxController(){
    @Override
    public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
    {
        return true;
    }
});

None of the so far provided solutions worked for me. 到目前为止,没有一个解决方案能为我提供解决方 I ended up with Dan Alvizu's solution + my own hack: 我最终得到了Dan Alvizu的解决方案 +我自己的黑客:

private WebClient webClient = new WebClient();

public void scrapPage() {
    makeWebClientWaitThroughJavaScriptLoadings();
    HtmlPage page = login();
    //do something that causes JavaScript loading
    waitOutLoading(page);
}

private void makeWebClientWaitThroughJavaScriptLoadings() {
    webClient.setAjaxController(new AjaxController(){
        @Override
        public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
        {
            return true;
        }
    });
}

private void waitOutLoading(HtmlPage page) {
    while(page.asText().contains("Please wait while loading!")){
        webClient.waitForBackgroundJavaScript(100);
    }
}

Needless to say, "Please wait while loading!" 不用说,“请等待加载!” should be replaced with whatever text is shown while your page is loading. 应该在页面加载时替换为显示的任何文本。 If there is no text, maybe there is a way to check for existence of some gif (if that is used). 如果没有文本,也许有办法检查是否存在某些gif(如果使用的话)。 Of course, you could simply provide a big enough milliseconds value if you're feeling adventurous. 当然,如果你喜欢冒险,你可以简单地提供足够大的毫秒值。

As documentation states, waitForBackgroundJavaScript is experimental: 正如文档所述, waitForBackgroundJavaScript是实验性的:

Experimental API: May be changed in next release and may not yet work perfectly! 实验API:可能会在下一个版本中更改,但可能尚未完美运行!

The next approach has always worked for me, regardless of the BrowserVersion used: 无论使用的是BrowserVersion ,下一种方法对我来说一直BrowserVersion用:

int tries = 5;  // Amount of tries to avoid infinite loop
while (tries > 0 && aCondition) {
    tries--;
    synchronized(page) {
        page.wait(2000);  // How often to check
    }
}

Note aCondition is whatever you're checking for. 注意aCondition是您要检查的任何内容。 EG: 例如:

page.getElementById("loading-text-element").asText().equals("Loading...")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM