[英]Unable to navigate aspx pages via jsoup while scraping
我正在 jsoup 中抓取 url( http://nvsos.gov/sosentitysearch/CorpSearch.aspx ),但是我能夠抓取第一頁但無法導航到第二頁。
這是代碼片段
try
{
string url = "http://nvsos.gov/sosentitysearch/CorpSearch.aspx";
Connection.Response response = Jsoup.connect(url).method(Connection.Method.GET).execute();
Document responseDocument = response.parse();
Element eventValidation = responseDocument.select("input[name=__EVENTVALIDATION]").first();
Element viewState = responseDocument.select("input[name=__VIEWSTATE]").first();
//javascript:__doPostBack('ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01','')
response = Jsoup.connect(url)
.data("__VIEWSTATE", viewState.attr("value"))
.data("__EVENTVALIDATION", eventValidation.attr("value"))
.data("ctl00$MainContent$txtSearchBox", "apple") // <- search
.data("ctl00$MainContent$btnCorpSearch", "Search")
.data("ctl00$MainContent$ddlCorpSortColumns", "m")
.data("ctl00$MainContent$ddlCorpNumSortColumns", "m")
.data("ctl00$MainContent$ddlOfficerSortColumns", "m")
.data("ctl00$MainContent$ddlRASortColumns", "m")
.data("ctl00$MainContent$ddlABNSortColumns", "m")
.data("ctl00$MainContent$ddlABNSortColumns", "m")
.data("ctl00$MainContent$rdlSortOrder", "d")
.data("ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01", "")
.method(Connection.Method.POST)
.followRedirects(true)
.execute();
Document document = response.parse(); //search results
System.out.println(document);
}
catch (IOException e)
{
e.printStackTrace();
}
這里.data("ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01", "")
是導航到第二頁,但它總是返回第一頁。
您可能缺少一些 cookie。 試試下面的代碼:
response = Jsoup.connect(url)
.cookies(response.cookies()) // Add cookies received when fetching the first page
.data("__VIEWSTATE", viewState.attr("value"))
.data("__EVENTVALIDATION", eventValidation.attr("value"))
.data("ctl00$MainContent$txtSearchBox", "apple") // <- search
.data("ctl00$MainContent$btnCorpSearch", "Search")
.data("ctl00$MainContent$ddlCorpSortColumns", "m")
.data("ctl00$MainContent$ddlCorpNumSortColumns", "m")
.data("ctl00$MainContent$ddlOfficerSortColumns", "m")
.data("ctl00$MainContent$ddlRASortColumns", "m")
.data("ctl00$MainContent$ddlABNSortColumns", "m")
.data("ctl00$MainContent$ddlABNSortColumns", "m")
.data("ctl00$MainContent$rdlSortOrder", "d")
.data("ctl00$MainContent$objSearchGrid$dgCorpSearchResults$ctl54$ctl01", "")
.method(Connection.Method.POST)
.followRedirects(true)
.execute();
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.