简体   繁体   English

使用Jsoup登录和导航

[英]Logging in and navigating using Jsoup

I am trying to login to a website using JSoup, my goal is scrape some data from the website but I am having some problems with the logging in/navigating. 我正在尝试使用JSoup登录到网站,我的目标是从网站上抓取一些数据,但是登录/导航时遇到一些问题。

See the code below for how the code currently looks like. 有关当前代码的外观,请参见下面的代码。

    try {
        Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
                .method(Connection.Method.GET)
                .execute();

        response = Jsoup.connect("https://app.northpass.com/login")
                .data("educator[email]", "email123")
                .data("educator[password]", "password123")
                .cookies(response.cookies())
                .method(Connection.Method.POST)
                .execute();

        // Go to new page
        Document coursePage = Jsoup.connect("https://app.northpass.com/course")
                .cookies(response.cookies())
                .get();

        System.out.println(groupPage.title());

    } catch (IOException e) {
        e.printStackTrace();
    }

I have also tried adding 我也尝试添加

.data("commit", "Log in")

and

.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")

without any success. 没有任何成功。

The error I get is as follow: 我得到的错误如下:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=https://app.northpass.com/login

From what I have read on other threads, people suggest using a userAgent (which, as said above, I have already tried). 根据我在其他线程上阅读的内容,人们建议使用userAgent(如上所述,我已经尝试过)。 Thanks in advance for any help. 在此先感谢您的帮助。

If you look at the network traffic when you attempt a login in your browser you'll see that an additional item of data is sent: authenticity_token . 如果尝试在浏览器中尝试登录时查看网络流量,则会看到发送了其他数据: authenticity_token This is a hidden field in the form. 这是表单中的隐藏字段。

You will need then to extract that from the initial response and send it with the POST request: 然后,您需要从初始响应中提取出来,并与POST请求一起发送:

try {
    Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
            .method(Connection.Method.GET)
            .execute();

    //I can't test this but will be something like
    //see https://jsoup.org/cookbook/extracting-data/selector-syntax
    Document document = response.parse();
    String token = document.select("input[hidden]").first().val();

    response = Jsoup.connect("https://app.northpass.com/login")
            .data("educator[email]", "email123")
            .data("educator[password]", "password123")
            .data("authenticity_token", token)
            .cookies(response.cookies())
            .method(Connection.Method.POST)
            .execute();

    // Go to new page
    Document coursePage = Jsoup.connect("https://app.northpass.com/course")
            .cookies(response.cookies())
            .get();

    System.out.println(groupPage.title());

} catch (IOException e) {
    e.printStackTrace();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM