繁体   English   中英

如何从Java中的绝对http路径获取相对路径

[英]How can I get relative path from absolute http paths in Java

我正在尝试用Java构建Web爬网程序,并且想知道是否有任何方法可以从给定基本URL的绝对路径中获得相对路径。 我正在尝试替换同一域下html中的任何绝对路径。

由于http网址包含不安全的字符,因此我无法按照如何从两个绝对路径(或URL)构造Java的相对路径中所述使用Java URI

我正在使用jsoup来解析我的html,看来它能够从相对位置获取绝对路径,但反过来却不能。

例如,在以下html的特定html中,

"http://www.example.com/mysite/base.html"

在base.html页面源中,它可以包含:

'<a href="http://www.example.com/myanothersite/new.html"> Another site of mine </a>

我试图缓存此base.html,并对其进行编辑,使其现在包含:

'<a href="../myanothersite/new.html">Another site of mine</a>

一种不需要给定baseUrl并使用更高级方法的不同方法。

    String sourceUrl = "http://www.example.com/mysite/whatever/somefolder/bar/unsecure!+?#whätyöühäv€it/site.html"; // your current site
    String targetUrl = "http://www.example.com/mysite/whatever/otherfolder/other.html"; // the link target
    String expectedTarget = "../../../otherfolder/other.html";
    String[] sourceElements = sourceUrl.split("/");
    String[] targetElements = targetUrl.split("/"); // keep in mind that the arrays are of different length!
    StringBuilder uniquePart = new StringBuilder();
    StringBuilder relativePart = new StringBuilder();
    boolean stillSame = true;
    for(int ii = 0; ii < sourceElements.length || ii < targetElements.length; ii++) {
        if(ii < targetElements.length && ii < sourceElements.length && 
                stillSame && sourceElements[ii].equals(targetElements[ii]) && stillSame) continue;
        stillSame = false;
        if(targetElements.length > ii)
          uniquePart.append("/").append(targetElements[ii]);
        if(sourceElements.length > ii +1)
            relativePart.append("../");
    }

    String result = relativePart.toString().substring(0, relativePart.length() -1) + uniquePart.toString();
    System.out.println("result: " + result);

这应该做。 请记住,您可以通过测量源URL和目标URL的距离来计算baseUrl!

    String baseUrl = "http://www.example.com/mysite/whatever/"; // the base of your site
    String sourceUrl = "http://www.example.com/mysite/whatever/somefolder/bar/unsecure!+?#whätyöühäv€it/site.html"; // your current site
    String targetUrl = "http://www.example.com/mysite/whatever/otherfolder/other.html"; // the link target
    String expectedTarget = "../../../otherfolder/other.html";
    // cut away the base.
    if(sourceUrl.startsWith(baseUrl))
        sourceUrl = sourceUrl.substring(baseUrl.length());
    if(!sourceUrl.startsWith("/"))
        sourceUrl = "/" + sourceUrl;

    // construct the relative levels up
    StringBuilder bar = new StringBuilder();
    while(sourceUrl.startsWith("/"))
    {
        if(sourceUrl.indexOf("/", 1) > 0) {
            bar.append("../");
            sourceUrl = sourceUrl.substring(sourceUrl.indexOf("/", 1));
        } else {
            break;
        }
        System.out.println("foo: " + sourceUrl);
    }

    // add the unique part of the target
    targetUrl = targetUrl.substring(baseUrl.length());
    bar.append(targetUrl);

    System.out.println("expectation: " + expectedTarget.equals(bar.toString()));
    System.out.println("bar: " + bar);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM