简体   繁体   中英

How to get web page that contain errors with HtmlUnit?

I'm trying to access this Ajax page in my Java program using the HtmlUnit 2.15 API , but it fails when trying to get the page. I think the cause is a website request to this broken/missing file located here .

My code:

public class HtmlUnitExample {
    public static void main(String[] args) throws Exception, FailingHttpStatusCodeException, MalformedURLException, IOException, InterruptedException {
        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
        webClient.getOptions().setTimeout(120000);
        webClient.waitForBackgroundJavaScript(60000);
        webClient.getOptions().setRedirectEnabled(true);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setCssEnabled(true);
        webClient.getOptions().setUseInsecureSSL(true);
        webClient.getOptions().setDoNotTrackEnabled(true);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        String url = "http://www.santanderuniversidades.com.br/JuriPopular/index.aspx?idprojeto=16";
        final HtmlPage page = (HtmlPage) webClient.getPage(url); //Fails here
        System.out.println(page.asXml());

    }
}

Error message:

Exception in thread "main" java.io.EOFException: Unexpected end of ZLIB input stream
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:68)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.downloadContent(HttpWebConnection.java:693)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.downloadResponseBody(HttpWebConnection.java:675)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:201)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1313)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1230)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:338)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:407)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:392)
    at HtmlUnitExample.main(HtmlUnitExample.java:42)//getPage line

Page link to the css :

<link href='/JuriPopular/App_Themes/estilo/css.axd?files=jPages.css,estilo.css,jquery.fancybox.css' type='text/css' rel='stylesheet' />

The css that calls the missing font file :

@font-face{font-family:'DigitalDotRoadsign';
src:url('fonts/DigitalDotRoadsign.eot');
src:url('fonts/DigitalDotRoadsign.eot?#iefix') format('embedded-opentype'),
url('fonts/DigitalDotRoadsign.woff') format('woff'), //call missing file
url('fonts/DigitalDotRoadsign.ttf') format('truetype'),
url('fonts/DigitalDotRoadsign.svg#svgDigitalDotRoadsign') format('svg');
font-weight:normal;
}

Is this the source of my problems? If that's the case, is there anyway to avoid it? Maybe ignoring/removing the cause of the issue?

Actualy to fix the issue i simply enabled the cookies. I guess it was necessary to load the page.

The code:

webClient.getCookieManager().setCookiesEnabled(true);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM