简体   繁体   English

使用C#和.net 3.5阅读RSS的问题

[英]Problems Reading RSS with C# and .net 3.5

I have been attempting to write some routines to read RSS and ATOM feeds using the new routines available in System.ServiceModel.Syndication, but unfortunately the Rss20FeedFormatter bombs out on about half the feeds I try with the following exception: 我一直在尝试编写一些例程来使用System.ServiceModel.Syndication中提供的新例程来读取RSS和ATOM提要,但不幸的是,Rss20FeedFormatter对我尝试的大约一半的提示进行了炸弹,但有以下异常:

 An error was encountered when parsing a DateTime value in the XML. 

This seems to occur whenever the RSS feed expresses the publish date in the following format: 只要RSS提要以下列格式表达发布日期,就会出现这种情况:

Thu, 16 Oct 08 14:23:26 -0700 2008年10月16日星期四14:23:26 -0700

If the feed expresses the publish date as GMT, things go fine: 如果Feed将发布日期表示为GMT,那么情况就好了:

Thu, 16 Oct 08 21:23:26 GMT 周四,08年10月16日21:23:26 GMT

If there's some way to work around this with XMLReaderSettings, I have not found it. 如果有一些方法可以使用XMLReaderSettings解决这个问题,我还没有找到它。 Can anyone assist? 有人可以帮忙吗?

Based on the workaround posted in the bug report to Microsoft about this I made an XmlReader specifically for reading SyndicationFeeds that have non-standard dates. 根据微软提交给我的错误报告中的解决方法,我制作了一个专门用于阅读具有非标准日期的SyndicationFeeds的XmlReader。

The code below is slightly different than the code in the workaround at Microsoft's site. 下面的代码与Microsoft网站的变通方法中的代码略有不同。 It also takes Oppositional's advice on using the RFC 1123 pattern. 它还需要Oppositional关于使用RFC 1123模式的建议

Instead of simply calling XmlReader.Create() you need to create the XmlReader from a Stream. 您不需要简单地调用XmlReader.Create(),而是需要从Stream创建XmlReader。 I use the WebClient class to get that stream: 我使用WebClient类来获取该流:

WebClient client = new WebClient();
using (XmlReader reader = new SyndicationFeedXmlReader(client.OpenRead(feedUrl)))
{
    SyndicationFeed feed = SyndicationFeed.Load(reader);
    ....
    //do things with the feed
    ....
}

Below is the code for the SyndicationFeedXmlReader: 以下是SyndicationFeedXmlReader的代码:

public class SyndicationFeedXmlReader : XmlTextReader
{
    readonly string[] Rss20DateTimeHints = { "pubDate" };
    readonly string[] Atom10DateTimeHints = { "updated", "published", "lastBuildDate" };
    private bool isRss2DateTime = false;
    private bool isAtomDateTime = false;

    public SyndicationFeedXmlReader(Stream stream) : base(stream) { }

    public override bool IsStartElement(string localname, string ns)
    {
        isRss2DateTime = false;
        isAtomDateTime = false;

        if (Rss20DateTimeHints.Contains(localname)) isRss2DateTime = true;
        if (Atom10DateTimeHints.Contains(localname)) isAtomDateTime = true;

        return base.IsStartElement(localname, ns);
    }

    public override string ReadString()
    {
        string dateVal = base.ReadString();

        try
        {
            if (isRss2DateTime)
            {
                MethodInfo objMethod = typeof(Rss20FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Static);
                Debug.Assert(objMethod != null);
                objMethod.Invoke(null, new object[] { dateVal, this });

            }
            if (isAtomDateTime)
            {
                MethodInfo objMethod = typeof(Atom10FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Instance);
                Debug.Assert(objMethod != null);
                objMethod.Invoke(new Atom10FeedFormatter(), new object[] { dateVal, this });
            }
        }
        catch (TargetInvocationException)
        {
            DateTimeFormatInfo dtfi = CultureInfo.CurrentCulture.DateTimeFormat;
            return DateTimeOffset.UtcNow.ToString(dtfi.RFC1123Pattern);
        }

        return dateVal;

    }

}

Again, this is copied almost exactly from the workaround posted on the Microsoft site in the link above. 同样,这几乎完全是从上面链接中Microsoft站点上发布的变通方法中复制的。 ...except that this one works for me, and the one posted at Microsoft did not. ...除了这个适合我,而微软发布的那个没有。

NOTE : One bit of customization you may need to do is in the two arrays at the start of the class. 注意 :您可能需要做的一点定制是在类的开头的两个数组中。 Depending on any extraneous fields your non-standard feed might add, you may need to add more items to those arrays. 根据非标准Feed可能添加的任何无关字段,您可能需要向这些阵列添加更多项目。

RSS 2.0 formatted syndication feeds utilize the RFC 822 date-time specification when serializing elements like pubDate and lastBuildDate . RSS 2.0格式的联合供稿在序列化pubDatelastBuildDate等元素时使用RFC 822日期时间规范 The RFC 822 date-time specification is unfortunately a very 'flexible' syntax for expressing the time-zone component of a DateTime. 遗憾的是,RFC 822日期时间规范是一种非常“灵活”的语法,用于表示DateTime的时区组件。

Time zone may be indicated in several ways. 时区可以以多种方式指示。 "UT" is Universal Time (formerly called "Greenwich Mean Time"); “UT”是世界时(以前称为“格林威治标准时间”); "GMT" is permitted as a reference to Universal Time. 允许“GMT”作为对世界时的参考。 The military standard uses a single character for each zone. 军事标准对每个区域使用单个字符。 "Z" is Universal Time. “Z”是世界时。 "A" indicates one hour earlier, and "M" indicates 12 hours earlier; “A”表示提前一小时,“M”表示提前12小时; "N" is one hour later, and "Y" is 12 hours later. “N”是一小时后,“Y”是12小时后。 The letter "J" is not used. 不使用字母“J”。 The other remaining two forms are taken from ANSI standard X3.51-1975. 其余两种形式取自ANSI标准X3.51-1975。 One allows explicit indication of the amount of offset from UT; 一个允许明确指示UT的偏移量; the other uses common 3-character strings for indicating time zones in North America. 另一个使用常见的3字符字符串来表示北美的时区。

I believe the issue involves how the zone component of the RFC 822 date-time value is being processed. 我认为该问题涉及如何处理RFC 822日期时间值的区域组件。 The feed formatter appears to not be handling date-times that utilize a local differential to indicate the time zone. Feed格式化程序似乎不处理使用本地差异来指示时区的日期时间。

As RFC 1123 extends the RFC 822 specification, you could try using the DateTimeFormatInfo.RFC1123Pattern ("r") to handle converting problamatic date-times, or write your own parsing code for RFC 822 formatted dates. 由于RFC 1123扩展了RFC 822规范,您可以尝试使用DateTimeFormatInfo.RFC1123Pattern (“r”)来处理转换problamatic日期时间,或者为RFC 822格式化日期编写自己的解析代码。 Another option would be to use a third party framework instead of the System.ServiceModel.Syndication namespace classes. 另一种选择是使用第三方框架而不是System.ServiceModel.Syndication命名空间类。

It appears there are some known issues with date-time parsing and the Rss20FeedFormatter that are in the process of being addressed by Microsoft. 似乎有一些已知的日期时间解析问题和正在由Microsoft解决的Rss20FeedFormatter。

Interesting. 有趣。 It would looks like the datetime formatting is not one of the ones naturally expected by the datetime parser. 看起来日期时间格式不是日期时间解析器自然期望的格式之一。 After looking at the feed classes it does not look like you can inject in your own formatting convention for the parser and they it likely uses a specific scheme for validating the feel. 在查看feed类之后,看起来你不能为解析器注入自己的格式约定,并且它们可能使用特定的方案来验证感觉。

You may be able to change how the datetime parser behaves by modifying the culture . 您可以通过修改区域性来更改日期时间分析程序的行为方式。 I have never done it before so I can't say for sure it would work. 我之前从未这样做过,所以我不能肯定它会起作用。

Another solution night be to first transform the feed you are trying to read. 另一个解决方案是首先转换您正在尝试阅读的Feed。 Likely not the greatest but it could get you around the issue. 可能不是最伟大的,但它可以让你解决这个问题。

Good luck. 祝好运。

A similar problem still persists in .NET 4.0 and I decided to work with XDocument instead of directly invoking SyndicationFeed . 类似的问题仍然存在于.NET 4.0中,我决定使用XDocument而不是直接调用SyndicationFeed I described the applied method (specific to my project here ). 我(具体到我的项目中所记载的对方法在这里 )。 Can't say it is the best solution, but it certainly can be considered a "backup plan" in case SyndicationFeed fails. 不能说它是最好的解决方案,但在SyndicationFeed失败的情况下,它肯定可以被视为“备份计划”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM