[英]JSOUP - How to crawl a “login required” page using JSOUP
I'm having trouble at crawling a determined website I wish to crawl. 我在爬网一个想要爬网的坚定网站时遇到了麻烦。 The problem is: after successfully logging in to that website I can't access a link which requires a valid login.
问题是:成功登录该网站后,我无法访问需要有效登录的链接。
For example: 例如:
public Document executeLogin(String user, String password) {
try {
Connection.Response loginForm = Jsoup.connect(url)
.method(Connection.Method.GET)
.execute();
Document mainPage = Jsoup.connect(login-validation-url)
.data("user", user)
.data("senha", password)
.cookies(loginForm.cookies())
.post();
Document evaluationPage = Jsoup.connect(login-required-url)
.get();
return evaluationPage;
} catch (IOException ioe) {
return null;
}
What I do here is: 我在这里做的是:
I know I have to store cookies to keep the session alive, but when I connect to the login validation url, it returns me a Document object, and there are no cookies to get from that object. 我知道我必须存储cookie才能使会话保持活动状态,但是当我连接到登录验证url时,它将返回我一个Document对象,并且没有可从该对象获取的cookie。
Is there any way to get the "session" created by the successful log in and send it within other Jsoup.connects? 有没有办法获取成功登录创建的“会话”并将其发送到其他Jsoup.connects中? What I want to do, is to crawl a page that can only be accessed by logged users.
我想做的是抓取一个只能由登录用户访问的页面。
Thank you very much in advance. 提前非常感谢您。
Get the cookie after you login: 登录后获取Cookie:
Connection.Response loginForm = Jsoup.connect(url)
.method(Connection.Method.GET)
.execute();
Connection.Response mainPage = Jsoup.connect(login-validation-url)
.data("user", user)
.data("senha", password)
.cookies(loginForm.cookies())
.execute();
Map<String, String> cookies = mainPage.cookies();
Document evaluationPage = Jsoup.connect(login-required-url)
.cookies(cookies)
.execute.parse();
return evaluationPage;
When you get your second webpage, you also have to use the cookie: 当您获得第二个网页时,还必须使用cookie:
(Source: I had this problem a few days ago) (来源:几天前我遇到了这个问题)
So it's easier to just put the cookies in a Map
: 因此,将cookie放在
Map
更容易:
Map<String, String> cookies = loginForm.cookies();
And submit the forms using these cookies. 并使用这些cookie提交表单。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.