简体   繁体   中英

How do I get Html created by javascript using HtmlUnit in Java and then parse it with Jsoup?

I am trying to access some content on a web page that is created by some Javascript. However, the content that I wish to access is created by the javascript after the page has loaded so this chunk of Html source is no where to be found when I try and parse it with Jsoup.

My code for getting the Html source, using HtmlUnit is as follows:

public static void main(String[] args) throws IOException {
           java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); 

    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);

    String url = "myUrl.com";
    out.println("accessing " + url);

    HtmlPage page = webClient.getPage(url);

    out.println("waiting for js");
    webClient.waitForBackgroundJavaScriptStartingBefore(200);
    webClient.waitForBackgroundJavaScript(20000);

    out.println(page.asXml());

    webClient.close();
}

But when I run it, the Html that is supposed to be created is not printed. I was wondering how do I get this Html source, created by the Javascript, using HtmlUnit and then getting said result and passing it to Jsoup for parsing?

Jsoup is server side processing framework,
I am not sure what is your final goal, I assume you want to use it in the same page so I will go with Ajax so you can do:

  • On document ready, capture the document dom
  • Send it for processing on server side
  • Display the results on the same page

Something like:

.

$( document ).ready(function() {
    var allClientSideHtml = $("html").html();

var dataToSend = JSON.stringify({'htmlSendToSever':allClientSideHtml });
 $.ajax({ url: "your_Jsoup_server_url.jsp_or_php/YourJsoupParser",
        type: "POST",
        contentType: "application/json; charset=utf-8",
        dataType: "json",
        data: dataToSend , // pass that text to the server as a JSON String
        success: function (msg) { alert(msg.d); },
        error: function (type) { alert("ERROR!!" + type.responseText); }

    });

});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM