简体   繁体   中英

Jsoup .select returns empty value but element does contains text

I'm trying to get the text of "link" tag element in this xml: http://www.istana.gov.sg/latestupdate/rss.xml

I have coded to get the first article.

        URL = getResources().getString(R.string.istana_home_page_rss_xml);
        // URL = "http://www.istana.gov.sg/latestupdate/rss.xml";

        try {
            doc = Jsoup.connect(URL).ignoreContentType(true).get();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        // retrieve the link of the article
        links = doc.select("link");

        // retrieve the publish date of the article
        dates = doc.select("pubDate");

        //retrieve the title of the article
        titles = doc.select("title");

        String[] article1 = new String[3];
        article1[0] = links.get(1).text();
        article1[1] = titles.get(1).text();
        article1[2] = dates.get(0).text();

The article comes out nicely but the link returns "" value (The whole entire link elements return "" value). The titles and dates have no problems. The link tag consist of a URL text. Anyone knows why it returns "" value?

It looks like default HTML parser can't recognize <link> as valid tag and is automatically closing it <link /> which means that content of this tag is empty.

To solve this problem instead of HTML parser you can use XML parser which doesn't care that much about tag names.

doc = Jsoup.connect(URL)
      .ignoreContentType(true)
      .parser(Parser.xmlParser()) // <-- add this
      .get();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM