[英]Logging in and navigating using Jsoup
I am trying to login to a website using JSoup, my goal is scrape some data from the website but I am having some problems with the logging in/navigating. 我正在尝试使用JSoup登录到网站,我的目标是从网站上抓取一些数据,但是登录/导航时遇到一些问题。
See the code below for how the code currently looks like. 有关当前代码的外观,请参见下面的代码。
try {
Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
.method(Connection.Method.GET)
.execute();
response = Jsoup.connect("https://app.northpass.com/login")
.data("educator[email]", "email123")
.data("educator[password]", "password123")
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
// Go to new page
Document coursePage = Jsoup.connect("https://app.northpass.com/course")
.cookies(response.cookies())
.get();
System.out.println(groupPage.title());
} catch (IOException e) {
e.printStackTrace();
}
I have also tried adding 我也尝试添加
.data("commit", "Log in")
and 和
.userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
without any success. 没有任何成功。
The error I get is as follow: 我得到的错误如下:
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=https://app.northpass.com/login
From what I have read on other threads, people suggest using a userAgent (which, as said above, I have already tried). 根据我在其他线程上阅读的内容,人们建议使用userAgent(如上所述,我已经尝试过)。 Thanks in advance for any help.
在此先感谢您的帮助。
If you look at the network traffic when you attempt a login in your browser you'll see that an additional item of data is sent: authenticity_token
. 如果尝试在浏览器中尝试登录时查看网络流量,则会看到发送了其他数据:
authenticity_token
。 This is a hidden field in the form. 这是表单中的隐藏字段。
You will need then to extract that from the initial response and send it with the POST request: 然后,您需要从初始响应中提取出来,并与POST请求一起发送:
try {
Connection.Response response = Jsoup.connect("https://app.northpass.com/login")
.method(Connection.Method.GET)
.execute();
//I can't test this but will be something like
//see https://jsoup.org/cookbook/extracting-data/selector-syntax
Document document = response.parse();
String token = document.select("input[hidden]").first().val();
response = Jsoup.connect("https://app.northpass.com/login")
.data("educator[email]", "email123")
.data("educator[password]", "password123")
.data("authenticity_token", token)
.cookies(response.cookies())
.method(Connection.Method.POST)
.execute();
// Go to new page
Document coursePage = Jsoup.connect("https://app.northpass.com/course")
.cookies(response.cookies())
.get();
System.out.println(groupPage.title());
} catch (IOException e) {
e.printStackTrace();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.