简体   繁体   English

无法使用jsoup从网站获取所有html

[英]Not getting all html from website using jsoup

I know that there is a large number of questions of this type but i didn't found an answer... 我知道这类问题很多,但我没有找到答案。

So the problem is as in Subject. 因此,问题与主题相同。 At the beginning webpage requires me to enter a login and password - after log in i'm trying to get whole html but instead of that i have like 10-20% of html and rest of received data is even dont know from where (example below)... 在开始的网页上,我需要输入登录名和密码-登录后,我试图获取完整的html,但与此相反,我有html的10-20%,其余接收的数据甚至都不知道从哪里来(示例下面)...

> <script type="text/javascript">
> 
> window.currentUserName="******"; 
> window.currentUserGroups={};
> window.currentUserRoles={};
> 
> window.currentUserGroups['InventoryEditor']=true;
> window.currentUserRoles['InventoryEditor']=true;
> window.currentUserGroups['inventoryScriptBypass']=true;
> window.currentUserRoles['User']=true;
> window.currentUserGroups['rsuser']=true;
> window.currentUserRoles['User']=true;
> 
> 
> window.getCurrentUserName = function() {
>     return window.currentUserName; };
> 
> window.currentUserHasGroup = function(groupName) {
>     return window.currentUserGroups[groupName] ? true : false; };
> 
> window.currentUserHasGroups = function(groupNames) {
>     for (var i = 0; i < groupNames.length; i++) {
>         if (window.currentUserHasGroup(groupNames[i])) {
>             return true;
>         }
>     }
>     return false; };

and many many more... 还有更多...

My source code below: 我的源代码如下:

Connection.Response loginForm = Jsoup.connect("Website before login")
        .method(Connection.Method.GET)
        .execute();

Document document = Jsoup.connect("Website after login")
        //.maxBodySize(0)
        //.timeout(0)
        .data("cookieexists", "false")
        .data("login", "login")
        .data("password", "password")
        .data("submit", "Sign in")
        .cookies(loginForm.cookies())
        .get();

System.out.println(document); System.out.println(document); } }

When i'm trying to get html from website before login it's working well... Thanks for your help ! 当我尝试在登录前从网站获取html时,效果很好...感谢您的帮助!

“如果内容依赖于javascript,那么jsoup是不够的(没有javascript支持)。如果您使用搜索功能,则应该阅读建议以使用,例如HtmlUnit。如果没有页面url和凭据登录,问题就不会重现-能够。”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM