简体   繁体   English

JSoup不翻译html链接中的&符号

[英]JSoup not translating ampersand in links in html

In JSoup the following test case should pass, it is not. 在JSoup中,以下测试用例应该通过,但不是。

@Test
public void shouldPrintHrefCorrectly(){
    String content=  "<li><a href=\"#\">Good</a><ul><li><a href=\"article.php?boid=1865&sid=53&mid=1\">" +
            "Boss</a></li><li><a href=\"article.php?boid=186&sid=53&mid=1\">" +
            "heavent</a></li><li><a href=\"article.php?boid=167&sid=53&mid=1\">" +
            "hellos</a></li><li><a href=\"article.php?boid=181&sid=53&mid=1\">" +
            "Mr.Jackson!</a></li>";

    Document document = Jsoup.parse(content, "http://www.google.co.in/");
    Elements links = document.select("a[href^=article]");
    Iterator<Element> iterator = links.iterator();
    List<String> urls = new ArrayList<String>();
    while(iterator.hasNext()){
        urls.add(iterator.next().attr("href"));
    }

    Assert.assertTrue(urls.contains("article.php?boid=181&sid=53&mid=1"));
}

Could any of you please give me the reason as to why it is failing? 能否请您告诉我其失败的原因?

There are three problems: 存在三个问题:

  1. You're asserting that there's a bovikatanid parameter is present, while it's actually called boid . 您断言存在一个bovikatanid参数,但实际上称为boid

  2. The HTML source is using & instead of &amp; HTML源使用&而不是&amp; in the source. 在源中。 This is technically invalid. 从技术上讲这是无效的。

  3. Jsoup is parsing &mid as | Jsoup是解析&mid| somehow. 不知何故。 It should have scanned until ; 它应该扫描到; .

To fix #1, you have to do it yourself. 要解决#1,您必须自己做。 To fix #2, you have to report this issue to the serveradmin in question (it's their fault, however, since the average browser is forgiving on this, I'd imagine that Google is doing this to save bandwidth). 要修复第二个问题,您必须将此问题报告给有问题的serveradmin管理员(这是他们的错,但是,由于一般的浏览器都对此宽容,我想Google这样做是为了节省带宽)。 To fix #3, I've reported an issue to the Jsoup guy to see what he thinks about this. 为了解决#3,我已经向Jsoup的人报告了一个问题 ,以了解他对此的看法。


Update : see, Jonathan (the Jsoup guy) has fixed it. 更新 :看,乔纳森(Jsoup家伙)已修复它。 It'll be there in the next release. 它将在下一个版本中发布。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM