使用jsoup处理网页中的分页

Question

I have been using jsoup to crawl through webpages of a particular website. 我一直在使用jsoup来浏览特定网站的网页。 Basically i am trying to extract all the href's that have a link of a pdf. 基本上，我试图提取所有具有pdf链接的href。 I have been successful in getting all the link of a particular page . 我已经成功获取了特定页面的所有链接。 But there are 10 such pages. 但是有10个这样的页面。 The web pages uses a logic of javascript _doPostBack() function to navigate to other pages. 网页使用javascript _doPostBack（）函数的逻辑来导航到其他页面。 How do i get this done by jsoup. 我如何通过jsoup完成此操作。

This is how i am trying it right now 这就是我现在正在尝试的方式

Document document = Jsoup.connect(" some website name")
                        .data("__EVENTARGUMENT", __EVENTARGUMENT)
                        .data("__EVENTTARGET", __EVENTTARGET)
                        .data("__EVENTVALIDATION", __EVENTVALIDATION)
                        .data("__VIEWSTATEGENERATOR ", __VIEWSTATEGENERATOR)
                        .cookie("ASP.NET_SessionId", sessionId)
                        .followRedirects(true)
                        .timeout(0)
                        .userAgent(
                            "Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
                        .post();

But i am getting a false url output. 但是我收到错误的网址输出。 I have defined all the variables before sending. 我已经定义了所有变量，然后再发送。

Answer 1

When I hit this kind of problem, here how I solve them: 当我遇到这类问题时，请按以下解决方法：

Load the page in a browser 在浏览器中加载页面
Spy the http messages exchanged between the browser and the server while going through the pages (Fiddler, Firebug, Dev Console/Toolbar ...) 在浏览页面（Fiddler，Firebug，Dev Console /工具栏...）时监视浏览器和服务器之间交换的http消息
Identify every single bytes browser and server exchange (headers, cookies etc) 标识浏览器和服务器交换的每个字节（标头，Cookie等）
Once ALL single bytes identified try to go through the pages with hurl.it (enter headers, cookies, user-agent etc) 一旦确定了所有单个字节，请尝试使用hurl.it浏览页面（输入标头，Cookie，用户代理等）
Once you succeed going through pages with hurl.it, instruct Jsoup to do the same 一旦成功使用hurl.it浏览页面，请指示Jsoup执行相同的操作

使用jsoup处理网页中的分页

问题描述

1 个解决方案

解决方案1
0 2015-01-22 12:50:40

使用jsoup处理网页中的分页

问题描述

1 个解决方案

解决方案1 0 2015-01-22 12:50:40

解决方案1
0 2015-01-22 12:50:40