简体   繁体   English

Jsoup .select返回空值,但元素确实包含文本

[英]Jsoup .select returns empty value but element does contains text

I'm trying to get the text of "link" tag element in this xml: http://www.istana.gov.sg/latestupdate/rss.xml 我想在这个xml中获取“link”标签元素的文本: http//www.istana.gov.sg/latestupdate/rss.xml

I have coded to get the first article. 我编写了第一篇文章。

        URL = getResources().getString(R.string.istana_home_page_rss_xml);
        // URL = "http://www.istana.gov.sg/latestupdate/rss.xml";

        try {
            doc = Jsoup.connect(URL).ignoreContentType(true).get();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        // retrieve the link of the article
        links = doc.select("link");

        // retrieve the publish date of the article
        dates = doc.select("pubDate");

        //retrieve the title of the article
        titles = doc.select("title");

        String[] article1 = new String[3];
        article1[0] = links.get(1).text();
        article1[1] = titles.get(1).text();
        article1[2] = dates.get(0).text();

The article comes out nicely but the link returns "" value (The whole entire link elements return "" value). 文章很好地出来但链接返回“”值(整个链接元素返回“”值)。 The titles and dates have no problems. 标题和日期没有问题。 The link tag consist of a URL text. 链接标记由URL文本组成。 Anyone knows why it returns "" value? 任何人都知道为什么它会返回“”值?

It looks like default HTML parser can't recognize <link> as valid tag and is automatically closing it <link /> which means that content of this tag is empty. 看起来默认的HTML解析器无法将<link>识别为有效标记,并自动将其关闭<link /> ,这意味着此标记的内容为空。

To solve this problem instead of HTML parser you can use XML parser which doesn't care that much about tag names. 要解决此问题而不是HTML解析器,您可以使用XML解析器,它不关心标记名称。

doc = Jsoup.connect(URL)
      .ignoreContentType(true)
      .parser(Parser.xmlParser()) // <-- add this
      .get();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM