简体   繁体   English

如何使用 Jsoup 填写表格?

[英]How to fill a form with Jsoup?

I am trying to navigate to description page of California website http://kepler.sos.ca.gov/ .我正在尝试导航到加利福尼亚网站http://kepler.sos.ca.gov/ 的描述页面。 but unable to go .却无法去。

Then,I have a html form, on which I am submitting request, I am unable to add form here but its simple a POST request to http://kepler.sos.ca.gov/ with required params然后,我有一个 html 表单,我正在提交请求,我无法在此处添加表单,但它是一个简单的POST请求到http://kepler.sos.ca.gov/并带有所需的参数

I am able to get __EVENTTARGET and __EVENTARGUMENT from previous page from which I came here.我可以从我来到这里的上一页获得__EVENTTARGET__EVENTARGUMENT

What am I doing wrong?我究竟做错了什么?

code:代码:

String url = "kepler.sos.ca.gov/";
Connection.Response resp = Jsoup.connect(url)
                                .timeout(30000)
                                .method(Connection.Method.GET) 
                                .execute();
Document responseDocument = resp.parse();
Map<String, String> loginCookies = resp.cookies();
   eventValidation=responseDocument.select("input[name=__EVENTVALIDATION]").first();
viewState = responseDocument.select("input[name=__VIEWSTATE]").first();

You want to useFormElement .您想使用FormElement This is a useful feature of Jsoup.这是 Jsoup 的一个有用功能。 It is able to find the fields declared inside a form and post them for you.它能够找到表单中声明的​​字段并为您发布它们。 Before posting the form you can set the value of the fields using Jsoup API.在发布表单之前,您可以使用 Jsoup API 设置字段的值。

Nota:注意:

In the sample codes below, you'll always see calls to the Element#select method followed by a call to Elements#first method.在下面的示例代码中,您将始终看到对Element#select方法的调用,然后是对Elements#first方法的调用。

For example : responseDocument.select("form#aspnetForm").first()例如: responseDocument.select("form#aspnetForm").first()

Jsoup 1.11.1 has introduced a more efficient alternative : Element#selectFirst . Jsoup 1.11.1引入了一个更有效的替代方案: Element#selectFirst You can use it as a direct replacement of the original alternative.您可以将其用作原始替代品的直接替代品。

For example:例如:
responseDocument.select("form#aspnetForm").first()
can be replaced by可以替换为
responseDocument.selectFirst("form#aspnetForm")

SAMPLE CODE示例代码

// * Connect to website
String url = "http://kepler.sos.ca.gov/";
Connection.Response resp = Jsoup.connect(url) //
                                .timeout(30000) //
                                .method(Connection.Method.GET) //
                                .execute();

// * Find the form
Document responseDocument = resp.parse();
Element potentialForm = responseDocument.select("form#aspnetForm").first();
checkElement("form element", potentialForm);
FormElement form = (FormElement) potentialForm;

// * Fill in the form and submit it
// ** Search Type
Element radioButtonListSearchType = form.select("[name$=RadioButtonList_SearchType]").first();
checkElement("search type radio button list", radioButtonListSearchType);
radioButtonListSearchType.attr("checked", "checked");

// ** Name search
Element textBoxNameSearch = form.select("[name$=TextBox_NameSearch]").first();
checkElement("name search text box", textBoxNameSearch);
textBoxNameSearch.val("cali");

// ** Submit the form
Document searchResults = form.submit().cookies(resp.cookies()).post();

// * Extract results (entity numbers in this sample code)
for (Element entityNumber : searchResults.select("table[id$=SearchResults_Corp] > tbody > tr > td:first-of-type:not(td[colspan=5])")) {
    System.out.println(entityNumber.text());
}

public static void checkElement(String name, Element elem) {
    if (elem == null) {
        throw new RuntimeException("Unable to find " + name);
    }
}

OUTPUT (as of this writing)输出(截至撰写本文时)

C3036475
C3027305
C3236514
C3027304
C3034012
C3035110
C3028330
C3035378
C3124793
C3734637

See also:也可以看看:

In this example, we will log into the GitHub website by using the FormElement class.在本例中,我们将使用FormElement类登录到GitHub网站。

// # Constants used in this example
final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"; 
final String LOGIN_FORM_URL = "https://github.com/login";
final String USERNAME = "yourUsername";  
final String PASSWORD = "yourPassword";  

// # Go to login page
Connection.Response loginFormResponse = Jsoup.connect(LOGIN_FORM_URL)
                                             .method(Connection.Method.GET)
                                             .userAgent(USER_AGENT)
                                             .execute();  

// # Fill the login form
// ## Find the form first...
FormElement loginForm = (FormElement)loginFormResponse.parse()
                                         .select("div#login > form").first();
checkElement("Login Form", loginForm);

// ## ... then "type" the username ...
Element loginField = loginForm.select("#login_field").first();
checkElement("Login Field", loginField);
loginField.val(USERNAME);

// ## ... and "type" the password
Element passwordField = loginForm.select("#password").first();
checkElement("Password Field", passwordField);
passwordField.val(PASSWORD);        


// # Now send the form for login
Connection.Response loginActionResponse = loginForm.submit()
         .cookies(loginFormResponse.cookies())
         .userAgent(USER_AGENT)  
         .execute();

System.out.println(loginActionResponse.parse().html());

public static void checkElement(String name, Element elem) {
    if (elem == null) {
        throw new RuntimeException("Unable to find " + name);
    }
}

All the form data is handled by the FormElement class for us (even the form method detection).所有的表单数据都由 FormElement 类为我们处理(甚至是表单方法检测)。 A ready made Connection is built when invoking the FormElement#submit method.调用FormElement#submit方法时会构建一个现成的Connection All we have to do is to complete this connection with addional headers (cookies, user-agent etc) and execute it.我们所要做的就是用额外的头文件(cookies、用户代理等)完成这个连接并执行它。

This is the exact same code as posted above in the accepted answer, except that it reflects the changes California made to their website after the original answer was posted.这与上面已接受的答案中发布的代码完全相同,但它反映了加利福尼亚州在发布原始答案后对其网站所做的更改。 So as of my writing this, this code works.所以在我写这篇文章时,这段代码有效。 I've updated original comments, identifying any changes.我已经更新了原始评论,确定了任何更改。

// * Connect to website (Orignal url: http://kepler.sos.ca.gov/)
String url = "https://businesssearch.sos.ca.gov/";
Connection.Response resp = Jsoup.connect(url) //
                                .timeout(30000) //
                                .method(Connection.Method.GET) //
                                .execute();

// * Find the form (Original jsoup selector: from#aspnetForm)
Document responseDocument = resp.parse();
Element potentialForm = responseDocument.select("form#formSearch").first();
checkElement("form element", potentialForm);
FormElement form = (FormElement) potentialForm;

// * Fill in the form and submit it
// ** Search Type (Original jsoup selector: name$=RadioButtonList_SearchType)
Element radioButtonListSearchType = form.select("name$=SearchType]").first();
checkElement("search type radio button list", radioButtonListSearchType);
radioButtonListSearchType.attr("checked", "checked");

// ** Name search (Original jsoup selector: name$=TextBox_NameSearch)
Element textBoxNameSearch = form.select("[name$=SearchCriteria]").first();
checkElement("name search text box", textBoxNameSearch);
textBoxNameSearch.val("cali");

// ** Submit the form
Document searchResults = form.submit().cookies(resp.cookies()).post();

// * Extract results (entity numbers in this sample code, orignal jsoup selector: id$=SearchResults_Corp)
for (Element entityNumber : searchResults.select("table[id$=enitityTable] > tbody > tr > td:first-of-type:not(td[colspan=5])")) {
    System.out.println(entityNumber.text());
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM