简体   繁体   English

HtmlUnit HtmlSubmitInput.click()导致“错误的URL”更正为“cgi-bin”,然后导致UnknownHostException

[英]HtmlUnit HtmlSubmitInput.click() results in “Incorrect URL” corrected to “cgi-bin” which then leads to an UnknownHostException

I am trying to write a little bot that is supposed to access this site http://lsa.colorado.edu/cgi-bin/LSA-pairwise.html , enter some text in the textarea and fetch the resulting page from submitting by pressing the submit button. 我正在尝试编写一个应该访问此站点的小机器人http://lsa.colorado.edu/cgi-bin/LSA-pairwise.html ,在textarea中输入一些文本并通过按下提交来获取生成的页面提交按钮。 This is for a linguistics project. 这是一个语言学项目。 However, when I execute the click on the HtmlSubmitInput Button the returned URL seems to be malformed as IncorrectnessListenerImpl notifies me: 但是,当我执行单击HtmlSubmitInput按钮时,返回的URL似乎格式不正确,因为IncorrectnessListenerImpl通知我:

Apr 10, 2016 2:38:35 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Incorrect URL "http:/cgi-bin/LSA-pairwise-x.html" has been corrected

The URL should be 网址应该是

http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html

This then leads to the following stacktrace (shortened due to length): 然后导致以下堆栈跟踪(由于长度缩短):

Exception in thread "main" java.lang.RuntimeException: java.net.UnknownHostException: cgi-bin: unknown error
    at com.gargoylesoftware.htmlunit.WebClient.download(WebClient.java:2078)
    at com.gargoylesoftware.htmlunit.html.HtmlForm.submit(HtmlForm.java:141)
    at com.gargoylesoftware.htmlunit.html.HtmlSubmitInput.doClickStateUpdate(HtmlSubmitInput.java:90)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:795)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:742)
    at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:689)
    at LSABot.submitInput(LSABot.java:30)
    at Start.main(Start.java:8)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
 [...]

My guess is that HtmlUnit tries to fix the URL but this results in only "cgi-bin", which of course is malformed. 我的猜测是HtmlUnit尝试修复URL但这只会导致“cgi-bin”,这当然是格式错误的。 I've searched over and over but have not found anything relevant for my issue. 我一遍又一遍地搜索,但没有发现任何与我的问题相关的内容。

My LSABot class: 我的LSABot类:

public class LSABot {
    final WebClient webClient;
    private HtmlPage mainPg, rsltPg;
    private HtmlForm htmlForm;
    private HtmlTextArea txtA;
    private HtmlSubmitInput submitBt;

    public LSABot () throws Exception {
        this.webClient = new WebClient(BrowserVersion.CHROME);
        this.webClient.getOptions().setJavaScriptEnabled(true);
        this.mainPg = this.webClient.getPage("http://lsa.colorado.edu/cgi-bin/LSA-pairwise.html");
        this.htmlForm = this.mainPg.getForms().get(0);
        this.txtA = this.htmlForm.getTextAreaByName("txt1");
        this.submitBt = this.htmlForm.getInputByValue("Submit Texts");
    }

    public void submitInput(String input) {
        this.txtA.setText(input);
        try {
            this.rsltPg = this.submitBt.click();
            this.webClient.waitForBackgroundJavaScript(30*1000);
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
    }

The error comes from the html content of the form. 错误来自表单的html内容。 The action attribute should be http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html instead of http:/cgi-bin/LSA-pairwise-x.html . action属性应该是http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html而不是http:/cgi-bin/LSA-pairwise-x.html

Try this code, it should work: 试试这段代码,它应该有效:

LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); 
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);

String url = "http://lsa.colorado.edu/cgi-bin/LSA-pairwise.html";
final HtmlPage page = client.getPage(url);

HtmlForm htmlForm = page.getForms().get(0);
HtmlTextArea txtA = htmlForm.getTextAreaByName("txt1");
txtA.setText("hello");
HtmlSubmitInput submitBt = htmlForm.getInputByValue("Submit Texts");

// change the form action attribute to the correct one  
htmlForm.setAttribute("action", "http://lsa.colorado.edu/cgi-bin/LSA-pairwise-x.html");

HtmlPage page2 = submitBt.click();
System.out.println(page2.asText());

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM