简体   繁体   中英

C# RSS reader, dealing wtih ’ and similar

I'm trying to write a simple RSS feed reader in C# using the XmlReader class. The problem I've run into, is that some feeds use, from what I understand, HTML representations of some characters, such as ’ for apostrophe in the title/description. In fact, a couple newspapers I was looking at had some articles with just a regular old single quote used as an apostrophe and some where it was replaced with 146. I've considered doing string replacements before displaying the title/descrip, but I'd really rather avoid kludging and find a proper solution, if there is one, that will also work for other characters that use a similar format. Any suggestions would be very much appreciated.

您可以使用HttpUtility.HtmlDecode

Are you using built in features under the System.ServiceModel.Syndication whilst reading feeds?

If not - try out this, I belive it should automatically solve issues like I've described:

XmlReader reader = XmlReader.Create(ms);
// Configure XmlReader reader ...
// Create a new Syndication Feed
feed = SyndicationFeed.Load(reader);
SyndicationFeedFormatter formatter;

switch (format)
{
    case FeedFormat.Atom:
        formatter = new Atom10FeedFormatter(feed);
        break;

    default:
    case FeedFormat.Rss:
        formatter = new Rss20FeedFormatter(feed);
        break;
}

foreach (SyndicationItem item in formatter.Feed.Items)
{
     yield return item;
}

According to the Unicode spec, 146 (0x92) is not an apostrophe, it is the "PRIVATE USE ONE" character.

You probably have some editors pasting content from Word (with smart quotes enabled), which is giving you content in a different encoding (Windows-1252).

You should try to specify the correct encoding ("Windows-1252" or code page 1252), and it may work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM