简体   繁体   中英

Jsoup Login in to website returns only head of the html

I'm having this issue when I try to login to a website. I'm trying to login to this website: https://student.naviance.com/sbrunswick . I'm trying to scrape the name of the user in the website just as an example.

This is my code right now:

String initialUrl = "https://student.naviance.com/sbrunswick";
String finalUrl = "https://student.naviance.com/main";
String username = "***myusername***";
String password = "***mypassword***";

Connection.Response loginPageResponse = Jsoup.connect(initialUrl)
                                .userAgent("Mozilla/5.0")
                                .timeout(10 * 1000)
                                .followRedirects(true)
                                .execute();

Map<String, String> cookie = loginPageResponse.cookies();

Map<String, String> mapParams = new HashMap<String, String>();
mapParams.put("login-username", username);
mapParams.put("login-password", password);
mapParams.put("remember_me", "");

Connection.Response responsePostLogin = Jsoup.connect(finalUrl)
                        .referrer(initialUrl)
                        .userAgent("Mozilla/5.0")
                        .timeout(10 * 1000)
                        .data(mapParams)
                        .cookies(cookie)
                        .followRedirects(true)
                        .execute();

Document document = responsePostLogin.parse();
Log.d("HTML", document.html());

Document page = Jsoup.connect(finalUrl).userAgent("Mozilla/5.0").cookies(cookie).get();
Log.d("NAME", page.select(".h1 components-PageTitle-styles-pageTitle components-LandingHeader-styles-title").text());

I analyzed the form for the website and I'm pretty sure I filled out all the forms (hidden included). However, the HTML log doesn't show the entire HTML of the page. It only shows the head. This is what the log says when I search "HTML" in Logcat of Android Studio.

<!doctype html>
<html lang="en-US">
 <head>
  <title>Naviance Student</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width,initial-scale=1,minimum-scale=1">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <link rel="apple-touch-icon" href="/apple-icon.png">
  <link rel="apple-touch-icon" sizes="76x76" href="/apple-icon-76x76.png">
  <link rel="apple-touch-icon" sizes="114x114" href="/apple-icon-114x114.png">
  <link rel="apple-touch-icon" sizes="144x144" href="/apple-icon-144x144.png">
  <link rel="apple-touch-icon" sizes="152x152" href="/apple-icon-152x152.png">
  <link rel="apple-touch-icon" sizes="180x180" href="/apple-icon-180x180.png">
  <link rel="apple-touch-startup-image" href="/apple-icon.png">
  <meta name="apple-mobile-web-app-capable" content="yes">
  <meta name="apple-mobile-web-app-title" content="Naviance Student">
  <link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png">
  <link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
  <link rel="icon" type="image/png" sizes="96x96" href="/favicon-96x96.png">
  <link rel="manifest" href="/manifest.json">
  <meta http-equiv="Page-Enter" content="RevealTrans(Duration=2.0,Transition=2)">
  <meta http-equiv="Page-Exit" content="RevealTrans(Duration=3.0,Transition=12)">
  <meta http-equiv="cleartype" content="on">
  <meta name="msapplication-config" content="IEconfig.xml">
  <meta name="application-name" content="Naviance Student">
  <meta name="author" content="Naviance">
  <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
  <link href="/style-16726.css" rel="stylesheet">
  <link rel="preload" href="/main.e6791.js" as="script">
 </head>
 <body> 
  <script src="/rewritten_config.js?v=1605811315155"></script> 
  <div id="root"></div>
  <script src="/fc.vendors~main.bb74e.js"></script>
  <script src="/main.e6791.js" async></script>
 </body>
</html>

Even though this code says "body", when I go into the actual website and look this code is actually a part of the head.

I'm still new to Jsoup and I thought I got it so I went celebrating thinking I got it for like a half-hour before I realized it wasn't working.

Any ideas on how I can fix this? Thanks.

If you understood JSoup you probably got the basics of how it works. How HTML and HTTP work can vary greatly... static/responsive/mostly static with some responsive/use of cookies/etc.

What you'd need is to figure out if anything else is going on. I suspect performing a login wouldn't be as simple as passing username/password alone, but would require a session ID, possibly a 'Referer' header and some form token.

If you opened up /rewritten_config.js?v=1605811315155 you'd likely find some GET ajax requests and a bunch of document.write statements ... possibly heavily obfuscated. It is possible to understand whats going on but I typically struggle with this so won't invest my own time in it.

You may be better off trying another technology like selenium/phantomJS or another headless-browser. HtmlUnit is similar to Jsoup and might work but you'd have to try it out and see.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM