简体   繁体   中英

Complex retrieval of a web document

I need to retrieve a document from a website, and parse it. Problem is that:

  1. The site uses both http and https protocol
  2. You need to log in the site (I have a regular account)
  3. From the login page, there are at least 2 redirect just to log in yourself

I managed an HTTPS connection and posted my login and pass, but I'm having troubles with cookie management and the redirect....

commons-httpclient会有所帮助。

使用类似HtmlUnit的库可能会有所帮助。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM