简体   繁体   English

使用Jsoup无法成功登录此页面

[英]Unsuccessfully logging into this page using Jsoup

I've been struggling to login into this page so I could scrape the private 我一直在努力登录页面,所以我可以抓取私人

learn.sun.ac.za/my learn.sun.ac.za/my

page for a good while now. 页面好一阵子了。 I've searched through multiple SO posts and tried to apply the advice from each to no effect. 我搜索了多个SO帖子,并尝试将每个帖子的建议都无效。

attempt 1 尝试1

String url = "https://sso-prod.sun.ac.za/cas/login?service=http%3A%2F%2Flearn.sun.ac.za%2Flogin%2Findex.php";
String userAgent = "Mozilla/5.0";

            Connection.Response response = Jsoup.connect(url)
                    .userAgent(userAgent)
                    .method(Connection.Method.GET)
                    .execute();

            response = Jsoup.connect(url)
                    .userAgent(userAgent)
                    .cookies(response.cookies())
                    .data("action", "login")
                    .data("username", "MYUSERNAME")
                    .data("password", "MYPASSWORD")
                    .method(Connection.Method.POST)
                    //.followRedirects(true)
                    .execute();            

            Document doc = Jsoup.connect("http://learn.sun.ac.za/my")
                    .cookies(response.cookies())
                    .userAgent(userAgent)
                    .get();

            System.out.println(doc.title());

output: "Single Sign on | Corporation" 输出:“单一登录|公司”

indicating that it did not login. 表示它没有登录。

From advice in other posts I monitored the traffic out via chrome and added all the headers from there to the code 根据其他帖子中的建议,我通过chrome监视了流量,并将所有标头添加到了代码中

String url = "https://sso-prod.sun.ac.za/cas/login?service=http%3A%2F%2Flearn.sun.ac.za%2Flogin%2Findex.php";
String userAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36";

            Connection.Response response = Jsoup.connect(url)
                    .userAgent(userAgent)
                    .method(Connection.Method.GET)
                    .execute();

            response = Jsoup.connect(url)
                    .userAgent(userAgent)
                    .cookies(response.cookies())
                    .header("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
                    .header("accept-encoding", "gzip, deflate")
                    .header("accept-language", "en-US,en;q=0.8")
                    .header("cache-control", "max-age=0")
                    .header("connection", "keep-alive")
                    .header("content-length", "114")
                    .header("content-type","application/x-www-form-urlencoded")
                    .header("dnt", "1")
                    .header("host", "sso-prod.sun.ac.za")
                    .header("origin", "https://sso-prod.sun.ac.za" )
                    .header("referer", "https://sso-prod.sun.ac.za/cas/login?service=http%3A%2F%2Flearn.sun.ac.za%2Flogin%2Findex.php")
                    .header("upgrade-insecure-requests", "1")
                    .data("action", "login")
                    .data("username", "MYUSERNAME")
                    .data("password", "MYPASSWORD")
                    .data("lt", "LT-3042474-9t3oldTU1253G6HVqFffHgMWxnYXdg")
                    .data("execution", "e1s1" )
                    .data("_eventId", "submit")
                    .method(Connection.Method.POST)
                    //.followRedirects(true)
                    .execute();            

            Document doc = Jsoup.connect("http://learn.sun.ac.za/my")
                    .cookies(response.cookies())
                    .userAgent(userAgent)
                    .get();

            System.out.println(doc.title());

which had the same result. 结果相同。 What I did after was print out the actual html code and found that there was no login error message anywhere in the code which means I screwed up somewhere and havent actually submitted the form? 之后我做了什么,打印出了实际的html代码,发现代码中的任何地方都没有登录错误消息,这意味着我在某个地方搞砸了,还没有真正提交表单?

this is what a successful chrome login looks like 这是成功的chrome登录的样子 在此处输入图片说明 在此处输入图片说明

That's because your LT value is wrong - you should get the value for each new session, and not to send some value like you do. 那是因为您的LT值是错误的-您应该为每个新会话获取该值,而不要像您那样发送一些值。 Look at the HTML of the page: 查看页面的HTML

LT

The selector for the DIV that contains the value is div.controls:nth-child(1) . 包含该值的DIV的选择器是div.controls:nth-child(1)
So the steps are - 因此,步骤是-
Load the page 载入页面
Get the value of LT Add it to your POST request. 获取LT的值将其添加到您的POST请求中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM