简体   繁体   English

Jsoup.connect(url).get()仅返回一半的代码

[英]Jsoup.connect(url).get() returns only half the code

I have some code: 我有一些代码:

String url="http://www.fastvturesults.com/check_new_results/1rn12ec187";
Document doc=Jsoup.connect(url).get();
Log.i("DATA", doc.toString());

And my logcat output: 和我的logcat输出:

I/DATA﹕ <!DOCTYPE html>
<html lang="en">
<head>
<meta name="robots" content="noindex">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta property="og:site_name" content="Fast VTU Results - VTU Students Online Community">
<meta property="og:type" content="article">
<meta property="og:title" content="NISHANTH O(1RN12EC187)">
<meta property="og:description" content="NISHANTH O (1RN12EC187)">
<meta name="author" content="Harish">
<meta http-equiv="content-type" content="text/html;charset=UTF-8">
<script type="text/javascript">
//<![CDATA[
try{if (!window.CloudFlare) {var CloudFlare=[{verbose:0,p:0,byc:0,owlid:"cf",bag2:1,mirage2:0,oracle:0,paths:{cloudflare:"/cdn-cgi/nexp/dok3v=1613a3a185/"},atok:"495d5c7bbce19cd697869e6932b33c4a",petok:"1da02c85fa35bc2e676b85c137d245a01ea1bafe-1427603478-1800",zone:"fastvturesults.com",rocket:"0",apps:{"abetterbrowser":{"ie":"6"}}}];!function(a,b){a=document.createElement("script"),b=document.getElementsByTagName("script")[0],a.async=!0,a.src="//ajax.cloudflare.com/cdn-cgi/nexp/dok3v=919620257c/cloudflare.min.js",b.parentNode.insertBefore(a,b)}()}}catch(e){};
//]]>
</script>
<link rel="shortcut icon" href="http://www.fastvturesults.com/ico/favicon.ico">
<!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="assets/js/html5shiv.js"></script>
<script src="assets/js/respond.min.js"></script>
<![endif]-->
<link rel="stylesheet" type="text/css" href="http://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.0.3/css/bootstrap.min.css">
<style>
a{
color: #B94A48;
text-decoration: none;
}
.box-red-round{
background-color: #ffffff;
}
#fbPopup{
margin-top: 10%;
}
.navbar-custom {
background-color: #B94A48;
color: #ffffff;
border-radius: 0;
}
.navbar-custom .navbar-nav>li>a {
color: #fff;
}
.navbar-custom .navbar-nav>.active>a
{
color: #ffffff;
background-color: #000000;
}
.navbar-custom .navbar-nav>.active>a:hover,.navbar-custom .navbar-nav>.active>a:focus,.navbar-nav>li:hover,.navbar-nav>li:focus
{
color: #ffffff;
background-color: #000000;
}
.navbar-custom .navbar-brand {
color: #ffffff;
}
.blog-post-image{
float: left !important;
margin: 20px 20px;
}
.mini-nav-div{
background-color: #B94A48;
color: #ffffff;
}
.mini-nav-div a{
color: #ffffff;
}
</style>
<script type="text/javascript">
var jq = document.createElement('script');
jq.type = 'text/javascript';
jq.async = true;
jq.src = '//cdnjs.cloudflare.com/ajax/libs/jquery/2.0.3/jquery.min.js';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(jq, s);
</script>
<title>NISHANTH O(1RN12EC187)</title>
<meta name="description" content="NISHANTH O (1RN12EC187)">
<meta name="keywords" content="NISHANTH O results, NISHANTH O class rank, NISHANTH O university rank,1RN12EC187 results, 1RN12EC187 class rank, 1RN12EC187 university rank">
<script type="text/javascript">
var gb = document.createElement('script');
gb.type = 'text/javascript';
gb.async = true;
gb.src = ('https:' == document.location.protocol

Going through the source code of the page, the "document.location.protocol" (the last line of logcat output) isn't even half way through the source code. 遍历页面的源代码,“ document.location.protocol”(logcat输出的最后一行)甚至没有到达源代码的一半。

Why is the get() method returning only the first few lines of source code of the webpage? 为什么get()方法仅返回网页源代码的前几行?

This is not a problem with Jsoup. 这不是Jsoup的问题。 I don't know about logcat, but at this position in the HTML code the first question mark occurs: 我不了解logcat,但是在HTML代码中的这个位置出现了第一个问号:

document.location.protocol ? 'https://ssl'

So I guess there rather is some escaping problem in your logging workflow. 因此,我想您的日志记录工作流程中存在一些转义问题。

By the way, in order to avoid a 403 HTTP error, I had to set a fake user agent in order to fetch this URL with Jsoup: 顺便说一句,为了避免403 HTTP错误,我必须设置一个伪造的用户代理才能使用Jsoup来获取此URL:

Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0").get();

I faced the same problem where JSoup.parse missed to get some content. 我遇到了JSoup.parse缺少获得某些内容的相同问题。 After adding user agent it is solved. 添加用户代理后即可解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM