简体   繁体   中英

Java - Rome: I am trying to parse RSS feed but get a error on some channels

I am trying to work with RSS and parse it. I found the Rome and I am trying to work with it by code:

private SyndFeed parseFeed(String url) throws IllegalArgumentException, FeedException, IOException {
        return new SyndFeedInput().build(new XmlReader(new URL(url)));
    }


    public Boolean processRSSContent(String url) {
        try {
            SyndFeed theFeed = this.parseFeed(url);
            SyndEntry entry = theFeed.getEntries().get(0);
            ZonedDateTime entryUtcDate = ZonedDateTime.ofInstant(entry.getPublishedDate().toInstant(), ZoneOffset.UTC);
            String entryTitle = entry.getTitle();
            String entryText = entry.getDescription().getValue();
        }
        catch (ParsingFeedException e) {
            e.printStackTrace();
            return false;
        }
        catch (FeedException e) {
            e.printStackTrace();
            return false;
        }
        catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

On some channels like http://feeds.bbci.co.uk/news/world/rss.xml everything works fine, but on some other channels like http://habrahabr.ru/rss/ I get the error:

Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>". 

I took a look at the content behind this link and XML is really strange. But it's a popular site and I got this error on some other sites so I don't believe that XML there is a problem. What did I do wrong? How to read this RSS-channels?

If you put the url http://habrahabr.ru/rss/ to your browser, you'll notice that it redirects to https://habrahabr.ru/rss/interesting . Your code doesn't handle redirects.

I suggest you use HttpClientFeedFetcher from rome-fetcher module, it handles redirects and has other advanced features (caching, conditional GETs, compression):

HttpClientFeedFetcher feedFetcher = new HttpClientFeedFetcher();
try {
    SyndFeed feed = feedFetcher.retrieveFeed(new URL("http://habrahabr.ru/rss/"));
    System.out.println(feed.getLink());
} catch (IllegalArgumentException | IOException | FeedException | FetcherException e) {
    e.printStackTrace();
}

EDIT: Rome-fetcher is being deprecated , but Apache HttpClient can be used instead and it is more flexible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM