简体   繁体   English

C#RSS阅读器,处理'和类似

[英]C# RSS reader, dealing wtih ’ and similar

I'm trying to write a simple RSS feed reader in C# using the XmlReader class. 我正在尝试使用XmlReader类在C#中编写一个简单的RSS提要阅读器。 The problem I've run into, is that some feeds use, from what I understand, HTML representations of some characters, such as ’ 我遇到的问题是,根据我的理解,某些Feed会使用某些字符的HTML表示形式,例如’ for apostrophe in the title/description. 对于标题/描述中的撇号。 In fact, a couple newspapers I was looking at had some articles with just a regular old single quote used as an apostrophe and some where it was replaced with 146. I've considered doing string replacements before displaying the title/descrip, but I'd really rather avoid kludging and find a proper solution, if there is one, that will also work for other characters that use a similar format. 事实上,我正在看的一些报纸有一些文章只有一个常用的旧单引号用作撇号,有些用146替换它。我已经考虑过在显示标题/描述之前进行字符串替换,但我' d真的宁愿避免kludging并找到一个合适的解决方案,如果有的话,也适用于使用类似格式的其他角色。 Any suggestions would be very much appreciated. 任何建议将非常感谢。

您可以使用HttpUtility.HtmlDecode

Are you using built in features under the System.ServiceModel.Syndication whilst reading feeds? 您是否在阅读Feed时使用System.ServiceModel.Syndication下的内置功能?

If not - try out this, I belive it should automatically solve issues like I've described: 如果没有 - 尝试这个,我相信它应该自动解决我所描述的问题:

XmlReader reader = XmlReader.Create(ms);
// Configure XmlReader reader ...
// Create a new Syndication Feed
feed = SyndicationFeed.Load(reader);
SyndicationFeedFormatter formatter;

switch (format)
{
    case FeedFormat.Atom:
        formatter = new Atom10FeedFormatter(feed);
        break;

    default:
    case FeedFormat.Rss:
        formatter = new Rss20FeedFormatter(feed);
        break;
}

foreach (SyndicationItem item in formatter.Feed.Items)
{
     yield return item;
}

According to the Unicode spec, 146 (0x92) is not an apostrophe, it is the "PRIVATE USE ONE" character. 根据Unicode规范,146(0x92)不是撇号,它是“PRIVATE USE ONE”字符。

You probably have some editors pasting content from Word (with smart quotes enabled), which is giving you content in a different encoding (Windows-1252). 您可能有一些编辑器粘贴Word中的内容(启用智能引号),它会以不同的编码(Windows-1252)为您提供内容。

You should try to specify the correct encoding ("Windows-1252" or code page 1252), and it may work. 您应该尝试指定正确的编码(“Windows-1252”或代码页1252),它可能会起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM