简体   繁体   English

使用Linq-to-XML和C#阅读RSS feed-如何解码CDATA部分?

[英]Reading RSS feed with Linq-to-XML and C# - how to decode CDATA section?

I am trying to read an RSS feed using C# and Linq to XML. 我正在尝试使用C#和Linq to XML阅读RSS feed。 The feed is encoded in utf-8 (see http://pc03224.kr.hsnr.de/infosys/feed/ ) and reading it out generally works fine except for the description node because it is enclosed in a CDATA section. 提要使用utf-8编码(请参见http://pc03224.kr.hsnr.de/infosys/feed/ ),并且除了描述节点外,将其读出通常可以正常工作,因为它包含在CDATA部分中。

For some reason I can't see the CDATA tag in the debugger after reading out the content of the "description" tag but I guess it must be there somewhere because only in this section the German Umlaute (äöü) and other special characters are not shown correctly. 出于某种原因,在读取“ description”标记的内容之后,在调试器中看不到CDATA标记,但是我猜它必须在某个地方,因为仅在此部分中,德国Umlaute(äöü)和其他特殊字符不存在正确显示。 Instead they remain in the string utf-8 encoded like ü 相反,它们保留在utf-8字符串中,编码为ü .

Can I somehow read them out correctly or at least decode them afterwards? 我可以以某种方式正确地读出它们,或者至少在以后将它们解码吗?

This is a sample of the RSS section giving me troubles: 这是RSS部分给我带来麻烦的示例:

<description><![CDATA[blabla bietet H&#246;rern meiner Vorlesungen &#8220;IAS&#8221;, &#8220;WEB&#8221; und &#8220;SWE&#8221; an, Lizenzen f&#252;r blabla [...]]]></description>

Here is my code which reads out and parses the RSS feed data: 这是我的代码,可读取并解析RSS feed数据:

RssItems = (from xElem in xml.Descendants("channel").Descendants("item")
                            select new RssItem
                                       {
                                           Content =  xElem.Descendants("description").FirstOrDefault().Value,
                                           ...
                                       }).ToList();

Thanks in advance! 提前致谢!

Your code is working as intended. 您的代码按预期工作。 A CDATA section means that the contents should not be interpreted, ie "&#246;" CDATA节表示不应解释其内容,即"&#246;" should not be treated as an HTML entity but just as a sequence of characters. 不应视为HTML实体,而应视为字符序列。

Contact the author of the RSS feed and tell him to fix it, either by removing the CDATA tags so the entities get interpreted, or by putting the intended characters directly into the HTML file. 与RSS feed的作者联系,并告诉他解决此问题,方法是删除CDATA标记以使实体得到解释,或者将想要的字符直接放入HTML文件中。

Alternatively, have a look at HttpUtility.HtmlDecode to decode the CDATA contents a second time. 或者,看看HttpUtility.HtmlDecode再次解码CDATA内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM