简体   繁体   English

RSS源中的DateTime解析的异常使用c#中的SyndicationFeed

[英]Exceptions with DateTime parsing in RSS feed use SyndicationFeed in c#

I'm trying to parse Rss2, Atom feeds using SyndicationFeed objects. 我正在尝试使用SyndicationFeed对象解析Rss2,Atom提要。 But I'm getting XmlExceptions while parsing DateTime field like pubDate 但是我在解析DateTime字段时会得到XmlExceptions,就像pubDate一样

2012-01-17 08:01:06 2012-01-17 08:01:06

public static List<SyndicationItem> getRssData(string url)
{
    List<SyndicationItem> list = new List<SyndicationItem>();

    WebClient client = new WebClient();
    try
    {
        SyndicationFeed feed = SyndicationFeed.Load(XmlReader.Create(url));
        list = (from item in feed.Items select item).ToList();
    }
    catch (Exception e)
    {
        throw e;
    }

    return list;
}

The url link http://news.163.com/special/00011K6L/rss_newstop.xml 网址链接http://news.163.com/special/00011K6L/rss_newstop.xml

<item id="2">
    <title>...</title>
    <link>...</link>
    <description>......</description>
    <pubDate>2012-01-17 12:09:29</pubDate><-----Exception
</item>

Is there a better way to achieve this? 有没有更好的方法来实现这一目标? Please help. 请帮忙。 Thanks. 谢谢。

There is a workaround RSS20FeedFormatter throws exception trying to read some DateTime formats . 有一个解决方法RSS20FeedFormatter尝试读取一些DateTime格式时抛出异常

To work around this problem, create a custom XML reader that recognizes different date formats. 要解决此问题,请创建一个可识别不同日期格式的自定义XML阅读器。 The following is an example of a custom XML reader: 以下是自定义XML阅读器的示例:

XmlReader r = new MyXmlReader(url);
SyndicationFeed feed = SyndicationFeed.Load(r);
Rss20FeedFormatter rssFormatter = feed.GetRss20Formatter();
XmlTextWriter rssWriter = new XmlTextWriter("rss.xml", Encoding.UTF8);
rssWriter.Formatting = Formatting.Indented;
rssFormatter.WriteTo(rssWriter);
rssWriter.Close();

..and class used in previous code: ..和以前代码中使用的类:

class MyXmlReader : XmlTextReader
{
    private bool readingDate = false;
    const string CustomUtcDateTimeFormat = "ddd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009

    public MyXmlReader(Stream s) : base(s) { }

    public MyXmlReader(string inputUri) : base(inputUri) { }

    public override void ReadStartElement()
    {
        if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) &&
            (string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) ||
            string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase)))
        {
            readingDate = true;
        }
        base.ReadStartElement();
    }

    public override void ReadEndElement()
    {
        if (readingDate)
        {
            readingDate = false;
        }
        base.ReadEndElement();
    }

    public override string ReadString()
    {
        if (readingDate)
        {
            string dateString = base.ReadString();
            DateTime dt;
            if(!DateTime.TryParse(dateString,out dt))
                dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture);
            return dt.ToUniversalTime().ToString("R", CultureInfo.InvariantCulture);
        }
        else
        {
            return base.ReadString();
        }
    }
}

Basically, that RSS feed is invalid. 基本上,RSS提要无效。 If you look at the RSS 2.0 specification it states that: 如果你看一下RSS 2.0规范,它说明:

All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred). RSS中的所有日期时间均符合RFC 822的日期和时间规范,但年份可以用两个字符或四个字符(四个首选)表示。

The string "2012-01-17 12:09:29" doesn't comply to the "Date and Time" part of RFC 822 . 字符串“2012-01-17 12:09:29”不符合RFC 822“日期和时间”部分 It should be "17 01 2012 12:09:29" or something similar. 它应该是“17 01 2012 12:09:29”或类似的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM