繁体   English   中英

我可以获取使用jsoup下载的页面的网址吗?

[英]Can I get the url of the page that I downloaded with jsoup?

有没有办法获取我刚刚下载的页面的网址? 不是html页面中包含的链接,而是实际html页面本身的url?

我尝试这样做

org.jsoup.nodes.Document doc = Jsoup.parse(child, "UTF-8", "");
string url = doc.location();
System.out.println(url);

但是网址将返回一个空字符串。

假设您下载的页面是一个Document ,只需调用Document.location()即可获取该URL。 如果您传递给Jsoup.connect()的URL是重定向,则Document位置将为您提供最终从其提供服务的URL。

如果您通常使用WinHTTrack它将保存URL,但是您可以做的是查找连接到该站点URL的PHP​​文件或JavaScript文件。 例如,下面的下载站点具有一些链接:

<html lang="en-US">

<!-- Mirrored from brigade3.com/ by HTTrack Website Copier/3.x [XR&CO'2014], Sat, 13 Dec 2014 04:02:28 GMT -->
<!-- Added by HTTrack --><meta http-equiv="content-type" content="text/html;charset=UTF-8" /><!-- /Added by HTTrack -->
<head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <meta name=viewport content="width=device-width,initial-scale=1">
        <title>BukkitCloud | Beta Stage</title>
                    <link rel="profile" href="http://gmpg.org/xfn/11" />
    <link rel="pingback" href="xmlrpc.php" />
    <link rel="shortcut icon" type="image/x-icon" href="assets/uploads/2013/12/favico.png">
    <link rel='stylesheet' href='http://fonts.googleapis.com/css?family=Open+Sans:400,800,700italic,700,600italic,600,400italic,300italic,300|Source+Sans+Pro:200,300,400|Lato&amp;subset=latin,latin-ext' type='text/css' />
<link rel="alternate" type="application/rss+xml" title="Brigade &raquo; Feed" href="feed/index.html" />
<link rel="alternate" type="application/rss+xml" title="Brigade &raquo; Comments Feed" href="comments/feed/index.html" />
<link rel='stylesheet' id='rs-settings-css'  href='assets/plugins/revslider/rs-plugin/css/settings.css' type='text/css' media='all' />
<link rel='stylesheet' id='rs-captions-css'  href='assets/plugins/revslider/rs-plugin/css/captions.css' type='text/css' media='all' />
<link rel='stylesheet' id='default_style-css'  href='assets/themes/passage/style.css' type='text/css' media='all' />
<link rel='stylesheet' id='stylesheet-css'  href='assets/themes/passage/css/stylesheet.min.css' type='text/css' media='all' />
<!--[if IE 8]>
<link rel='stylesheet' id='ie8-style-css'  href='http://brigade3.com/assets/themes/passage/css/ie8.min.css' type='text/css' media='all' />
<![endif]-->
<!--[if IE 9]>
<link rel='stylesheet' id='ie9-style-css'  href='http://brigade3.com/assets/themes/passage/css/ie9.min.css' type='text/css' media='all' />
<![endif]-->
<link rel='stylesheet' id='style_dynamic-css'  href='assets/themes/passage/css/style_dynamic.css' type='text/css' media='all' />
<link rel='stylesheet' id='responsive-css'  href='assets/themes/passage/css/responsive.min.css' type='text/css' media='all' />
<link rel='stylesheet' id='style_dynamic_responsive-css'  href='assets/themes/passage/css/style_dynamic_responsive.css' type='text/css' media='all' />
<link rel='stylesheet' id='custom_css-css'  href='assets/themes/passage/css/custom_css.css' type='text/css' media='all' />
<script type='text/javascript' src='http://brigade3.com/wp-includes/js/jquery/jquery.js'></script>
<script type='text/javascript' src='http://brigade3.com/wp-includes/js/jquery/jquery-migrate.min.js'></script>
<script type='text/javascript' src='assets/plugins/revslider/rs-plugin/js/jquery.themepunch.revolution.min.js'></script>
<link rel='prev' title='FEATURES' href='features/index.html' />
<link rel='next' title='CONTACT' href='contact/index.html' />
<link rel='canonical' href='index.html' />
<link rel='shortlink' href='index.html' />
        <style type="text/css">
            .comments-link {
                display: none;
            }
                    </style>...

然后,如您所见,您将搜索可能的URL,因此在这种情况下,将是链接到JavaScript文件的URL。

<script type='text/javascript' src='http://brigade3.com/wp-includes/js/jquery/jquery.js'></script>
<script type='text/javascript' src='http://brigade3.com/wp-includes/js/jquery/jquery-migrate.min.js'></script>

然后只需将http://brigade3.com/wp-includes/js/jquery/jquery.js缩短到http://brigade3.com ,您便找到了网站URL。 我希望这就是你的意思!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM