简体   繁体   English

JSOUP-如何使用JSOUP搜寻“需要登录”页面

[英]JSOUP - How to crawl a “login required” page using JSOUP

I'm having trouble at crawling a determined website I wish to crawl. 我在爬网一个想要爬网的坚定网站时遇到了麻烦。 The problem is: after successfully logging in to that website I can't access a link which requires a valid login. 问题是:成功登录该网站后,我无法访问需要有效登录的链接。

For example: 例如:

public Document executeLogin(String user, String password) {
    try {
        Connection.Response loginForm = Jsoup.connect(url)
                .method(Connection.Method.GET)
                .execute();

        Document mainPage = Jsoup.connect(login-validation-url)
                .data("user", user)
                .data("senha", password)
                .cookies(loginForm.cookies())
                .post();

        Document evaluationPage = Jsoup.connect(login-required-url)
                .get();

       return evaluationPage;
    } catch (IOException ioe) {
        return null;
    }

What I do here is: 我在这里做的是:

  • Get the cookies from the login page, so I can login properly; 从登录页面获取cookie,以便我可以正确登录;
  • Then I post to the login validation url, which returns the main page after log in; 然后,我发布到登录验证URL,该URL在登录后返回主页。
  • Finally I try to access the login required url after logging in to the main page, but that request returns me the login page, as if the session had expired. 最终,我尝试在登录主页后尝试访问登录所需的URL,但是该请求返回了登录页面,就像会话已过期一样。

I know I have to store cookies to keep the session alive, but when I connect to the login validation url, it returns me a Document object, and there are no cookies to get from that object. 我知道我必须存储cookie才能使会话保持活动状态,但是当我连接到登录验证url时,它将返回我一个Document对象,并且没有可从该对象获取的cookie。

Is there any way to get the "session" created by the successful log in and send it within other Jsoup.connects? 有没有办法获取成功登录创建的“会话”并将其发送到其他Jsoup.connects中? What I want to do, is to crawl a page that can only be accessed by logged users. 我想做的是抓取一个只能由登录用户访问的页面。

Thank you very much in advance. 提前非常感谢您。

Get the cookie after you login: 登录后获取Cookie:

    Connection.Response loginForm = Jsoup.connect(url)
            .method(Connection.Method.GET)
            .execute();

    Connection.Response mainPage = Jsoup.connect(login-validation-url)
            .data("user", user)
            .data("senha", password)
            .cookies(loginForm.cookies())
            .execute();

    Map<String, String> cookies = mainPage.cookies();

    Document evaluationPage = Jsoup.connect(login-required-url)
            .cookies(cookies)
            .execute.parse();

   return evaluationPage;

When you get your second webpage, you also have to use the cookie: 当您获得第二个网页时,还必须使用cookie:

(Source: I had this problem a few days ago) (来源:几天前我遇到了这个问题)

So it's easier to just put the cookies in a Map : 因此,将cookie放在Map更容易:

Map<String, String> cookies = loginForm.cookies();

And submit the forms using these cookies. 并使用这些cookie提交表单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM