简体   繁体   English

使用Syndicationfeed加载带有编码链接的XML

[英]Use Syndicationfeed to load XML with encoded links

I'm reading an RSS using the following code: 我正在使用以下代码阅读RSS:

var reader = XmlReader.Create(url);
SyndicationFeed.Load(reader);

The RSS looks like this, and SyndicationFeed.Load will throw an exception when the link tag contains encoded characters (in this case å encoded as %C3%A5 ) 该RSS看起来像这样,和SyndicationFeed.Load时将抛出一个异常link标签包含编码字符(在这种情况下å编码为%C3%A5

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link rel="self" type="application/rss+xml" href="http://example.com/rss" />
    <title>My RSS</title>
    <description>My RSS</description>
    <pubDate>Mon, 04 Jul 2016 08:19:50 +0200</pubDate>
    <generator>RSS Generator 1.1</generator>
    <link>http://example.com/rss</link>
    <item>
      <title>A title</title>
      <description>A description</description>
      <link>http://bl%C3%A5ljus.se</link>
    </item>
  </channel>
</rss>

The exception is the following: 例外情况如下:

System.Xml.XmlException: Error in line x position x. An error was encountered when parsing the item's XML. Refer to the inner exception for more details. ---> 
System.UriFormatException: Invalid URI: The hostname could not be parsed.

   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
   at System.Uri..ctor(String uriString, UriKind uriKind)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadAlternateLink(XmlReader reader, Uri baseUri)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
   --- End of inner exception stack trace ---
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItem(XmlReader reader, SyndicationFeed feed)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItems(XmlReader reader, SyndicationFeed feed, Boolean& areAllItemsRead)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFeed(XmlReader reader)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader)
   at System.ServiceModel.Syndication.SyndicationFeed.Load[TSyndicationFeed](XmlReader reader)
System.UriFormatException: Invalid URI: The hostname could not be parsed.
   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
   at System.Uri..ctor(String uriString, UriKind uriKind)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadAlternateLink(XmlReader reader, Uri baseUri)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)

Are there any settings that I can pass when loading the XML to tell SyndicationFeed to ignore parsing errors? 加载XML时,我可以传递任何设置以告知SyndicationFeed忽略解析错误吗? Or some other solution? 还是其他解决方案?

The issue seems to be the creation of a Uri - you can reproduce with just this code: 问题似乎是Uri的创建-您可以仅使用以下代码进行复制:

var uri = new Uri("http://bl%C3%A5jus.se");

A possible solution is to pre-process the XML to decode the link urls before loading as a SyndicationFeed . 一种可能的解决方案是在加载为SyndicationFeed之前对XML进行预处理,以对链接网址进行解码。

var doc = XDocument.Load(url);

foreach (var link in doc.Descendants("link")
{
    link.Value = WebUtility.UrlDecode(link.Value);
}

using (var reader = doc.CreateReader())
{
    SyndicationFeed.Load(reader);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM