简体   繁体   English

使用HtmlUnit时出错

[英]Error while using HtmlUnit

When I execute this simple code to get the contents of a website as text, it shows errors which I can't understand. 当我执行这个简单的代码将网站的内容作为文本时,它会显示我无法理解的错误。

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.ScriptException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class sd {
    public static void main(String[] args) {
        sd vip=new sd();
        try {
            vip.homePage();
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.print("sssss");
    }

    public void homePage() throws Exception, ScriptException {
        final WebClient webClient = new WebClient();
        final HtmlPage page =       
    (HtmlPage)webClient.getPage("http://timesofindia.indiatimes.com/");
        String pageAsText = page.asText();
        String pageAsXML = page.asXml();

        // System.out.println(pageAsXML);
        System.out.println("////////////////////output//////////////////////////"); 
        System.out.println(pageAsText);
        // System.out.println(pageAsXML);
        System.out.println("////////////////////output ends//////////////////////////"); 
    }

}

Error that I get: 我得到的错误:

======= EXCEPTION START ========
Exception class=[com.gargoylesoftware.htmlunit.ScriptException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)
Caused by: java.lang.RuntimeException: Exception invoking jsxFunction_write
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)

The WebClient::setThrowExceptionOnScriptError method is deprecated since the HtmlUnit version 2.11. 自HtmlUnit版本2.11以来,不推荐使用WebClient::setThrowExceptionOnScriptError方法。 Use the following within newer versions: 在较新版本中使用以下内容:

webClient.getOptions().setThrowExceptionOnScriptError(false);

set your webClient to not throw javascript exceptions 将您的webClient设置为不抛出javascript异常

webClient.setThrowExceptionOnScriptError(false); webClient.setThrowExceptionOnScriptError(假);

If not enougth, set FF as client behavior when initializing your webclient. 如果不是enougth,请在初始化webclient时将FF设置为客户端行为。

webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient = new WebClient(BrowserVersion.FIREFOX_10); webClient = new WebClient(BrowserVersion.FIREFOX_10); // depending on HtmlUnit version //取决于HtmlUnit版本

Even I had this error. 即使我有这个错误。 This option of setting WebClient to suppress errors works for basic websites. 设置WebClient以抑制错误的此选项适用于基本网站。 But as the website becomes complex, it literally fails 但随着网站变得复杂,它确实失败了

After multiple trials, I finally had to choose Phantomjs . 经过多次试验,我终于不得不选择Phantomjs It is written in C++. 它是用C ++编写的。 I had to write some scripts and then execute it using phantomjs. 我必须编写一些脚本 ,然后使用phantomjs 执行它。 The script would load the url and write the data to a file. 该脚本将加载url并将数据写入文件。

Once that file is ready, I would write a java program to load the file data and then do my operations on that file. 一旦该文件准备好,我会编写一个java程序来加载文件数据,然后对该文件进行操作。 For loading and scraping through the data, I had used Jsoup . 为了加载和抓取数据,我使用了Jsoup

As you can see, HtmlUnit, Jaunt, Jsoup support full HTML, CSS. 如您所见,HtmlUnit,Jaunt,Jsoup支持完整的HTML,CSS。 What they are missing is that they do not support Javascript completely. 他们缺少的是他们不完全支持Javascript。 That is the main reason of errors such as Exceptions thrown, complete page not getting loaded and so on.. 这是错误的主要原因,例如抛出异常,完整页面没有加载等等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM