简体   繁体   English

在Java和HtmlUnit中,如何等待结果页面完成加载并将其下载为HTML?

[英]In Java and HtmlUnit, how to wait for a resulting page to finish loading and download it as HTML?

HtmlUnit is an awesome Java library that allows you to programatically fill out and submit web forms. HtmlUnit是一个很棒的Java库,可让您以编程方式填写和提交Web表单。 I'm currently maintaining a pretty old system written in ASP, and instead of manually filling out this one web form on a monthly basis as I'm required, I'm trying to find a way to maybe automate the entire task because I keep forgetting about it. 我目前正在维护一个用ASP编写的相当老的系统,而不是按需每月手动填写一份Web表单,而是尝试寻找一种方法来使整个任务自动化,因为我一直忘记它。 It's a form for retrieving data gathered within a month. 这是一种检索一个月内收集的数据的表格。 Here's what I've coded so far: 到目前为止,这是我编写的代码:

WebClient client = new WebClient();
HtmlPage page = client.getPage("http://urlOfTheWebsite.com/search.aspx");

HtmlForm form = page.getFormByName("aspnetForm");       
HtmlSelect frMonth = form.getSelectByName("ctl00$cphContent$ddlStartMonth");
HtmlSelect frDay = form.getSelectByName("ctl00$cphContent$ddlStartDay");
HtmlSelect frYear = form.getSelectByName("ctl00$cphContent$ddlStartYear");
HtmlSelect toMonth = form.getSelectByName("ctl00$cphContent$ddlEndMonth");
HtmlSelect toDay = form.getSelectByName("ctl00$cphContent$ddlEndDay");
HtmlSelect toYear = form.getSelectByName("ctl00$cphContent$ddlEndYear");
HtmlCheckBoxInput games = form.getInputByName("ctl00$cphContent$chkListLottoGame$0");
HtmlSubmitInput submit = form.getInputByName("ctl00$cphContent$btnSearch");

frMonth.setSelectedAttribute("1", true);
frDay.setSelectedAttribute("1", true);
frYear.setSelectedAttribute("2012", true);
toMonth.setSelectedAttribute("1", true);
toDay.setSelectedAttribute("31", true);
toYear.setSelectedAttribute("2012", true);
games.setChecked(true);
submit.click();

After the click() , I'm supposed to wait for the very same web page to finish reloading because somewhere there is a table that displays the results of my search. click() ,我应该等待完全相同的网页完成重新加载,因为某处有一个表,显示我的搜索结果。 Then, when the page is done loading, I need to download it as an HTML file (very much like "Save Page As..." in your favorite browser) because I will scrape out the data to compute their totals, and I've already done that using the Jsoup library. 然后,当页面加载完成后,我需要将其下载为HTML文件(非常类似于您喜欢的浏览器中的“页面另存为...”),因为我将抓取数据以计算其总和,我已经使用Jsoup库做到了。

My questions are: 1. How do I programatically wait for the web page to finish loading in HtmlUnit? 我的问题是:1.如何以编程方式等待网页完成在HtmlUnit中的加载? 2. How do I programatically download the resulting web page as an HTML file? 2.如何以编程方式将生成的网页下载为HTML文件?

I've looked into the HtmlUnit docs already and couldn't find a class that'll do what I need. 我已经研究过HtmlUnit文档,找不到适合我需要的类。

Try with these settings: 尝试以下设置:

webClient.waitForBackgroundJavaScript() or

webClient.waitForBackgroundJavaScriptStartingBefore()

I think you need to mention the browser as well.By default it is using IE.You will get more info from here. 我认为您也需要提及浏览器,默认情况下它使用的是IE,您将在此处获得更多信息。 HTMLUnit doesn't wait for Javascript HTMLUnit不等待Javascript

This example might help you. 这个例子可能对您有帮助。 After you click you need to wait for the page to load. 单击后,您需要等待页面加载。 Most of the time its a dynamic page that uses java scripts etc. All the overridden methods are there not to overwhelm you with a lot of console messages. 大多数情况下,它是一个使用Java脚本等的动态页面。所有重写的方法都不会使您收到大量的控制台消息。 You can implement the one you want. 您可以实现所需的一种。

public static void main(String[] args) throws IOException {
        WebClient webClient = gethtmlUnitClient();
        final HtmlPage page = webClient.getPage("YOUR PAGE");
        webClient.waitForBackgroundJavaScript(60000);
        System.out.println(page);

    }

static public WebClient gethtmlUnitClient() {
        WebClient webClient;
        LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log",
                "org.apache.commons.logging.impl.NoOpLog");
        java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
        java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
        webClient = new WebClient(BrowserVersion.CHROME);
        webClient.setIncorrectnessListener(new IncorrectnessListener() {
            @Override
            public void notify(String arg0, Object arg1) {
            }
        });
        webClient.setCssErrorHandler(new ErrorHandler() {

            @Override
            public void warning(CSSParseException arg0) throws CSSException {
                // TODO Auto-generated method stub

            }

            @Override
            public void fatalError(CSSParseException arg0) throws CSSException {
                // TODO Auto-generated method stub

            }

            @Override
            public void error(CSSParseException arg0) throws CSSException {
                // TODO Auto-generated method stub

            }
        });
        webClient.setJavaScriptErrorListener(new JavaScriptErrorListener() {

            @Override
            public void timeoutError(HtmlPage arg0, long arg1, long arg2) {
                // TODO Auto-generated method stub

            }

            @Override
            public void scriptException(HtmlPage arg0, ScriptException arg1) {
                // TODO Auto-generated method stub

            }

            @Override
            public void malformedScriptURL(HtmlPage arg0, String arg1, MalformedURLException arg2) {
                // TODO Auto-generated method stub

            }

            @Override
            public void loadScriptError(HtmlPage arg0, URL arg1, Exception arg2) {
                // TODO Auto-generated method stub

            }
        });
        webClient.setHTMLParserListener(new HTMLParserListener() {

            @Override
            public void warning(String arg0, URL arg1, String arg2, int arg3, int arg4, String arg5) {
                // TODO Auto-generated method stub

            }

            @Override
            public void error(String arg0, URL arg1, String arg2, int arg3, int arg4, String arg5) {
                // TODO Auto-generated method stub

            }
        });
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        return webClient;

    }

How do I programatically download the resulting web page as an HTML file 如何以编程方式将生成的网页下载为HTML文件

Try asXml() . 尝试asXml() Something like: 就像是:

page = submit.click();
String htmlContent = page.asXml();
File htmlFile = new File("C:/index.html");
PrintWriter pw = new PrintWriter(htmlFile, true);
pw.print(htmlContent);
pw.close();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM