简体   繁体   中英

XMLException when processing RSS

I've been trying to process RSS feeds using Argotic for my newsreader application. For most of them it works fine, but on some feed ( like this ) it breaks with the following:

Additional information: For security reasons DTD is prohibited in this XML document. To enable DTD processing set the DtdProcessing property on XmlReaderSettings to Parse and pass the settings into XmlReader.Create method.

The error was straightforward, I passed an XMLReaderSettings object with DtdProcessing enabled. But then the following appeared:

An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll Additional information: The ';' character, hexadecimal value 0x3B, cannot be included in a name. Line 9, position 366.

The code I am using:

    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreComments = true;
    settings.IgnoreWhitespace = true;
    settings.DtdProcessing = DtdProcessing.Parse;

    XmlReader reader = XmlReader.Create(this.url, settings);
    RssFeed feed = new RssFeed();
    feed.Load(reader);

What am I missing?

The exception is telling you that the RSS feed is illegal - specifically, that a name contains the ; character. The W3C specification appears to prohibit this:

Document authors are encouraged to use names which are meaningful words or combinations of words in natural languages, and to avoid symbolic or white space characters in names. Note that COLON, HYPHEN-MINUS, FULL STOP (period), LOW LINE (underscore), and MIDDLE DOT are explicitly permitted.

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names

Since other RSS readers also complained the feed was likely invalid. At the time of writing however, the W3C validator shows it to be valid !

According to the MSDN documentation for XmlReaderSettings.ConformanceLevel , this issue will cause an exception whatever your ConformanceLevel , but you might find another setting in XmlReaderSettings which can turn the behaviour off (supply the settings to XmlReader.Create ). Otherwise, if the feed can't be fixed, you'll have to perform some pre-processing on it.

似乎忽略了DtdProcessing解决了我的问题。

settings.DtdProcessing = DtdProcessing.Ignore;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM