简体   繁体   English

如何在这些C#Xml单元测试中通过第二次测试? XmlReaderSettings的初始化我会错过什么?

[英]How to make second test passing in these C# Xml unit tests? What do I miss in initialization of XmlReaderSettings?

Issue description: 问题说明:

  • I need to fix an issue with resolving of standard HTML entitities. 我需要解决标准HTML实体的问题。
  • I've implemented HtmlEntityReader - implementation of XmlReader which has a code to resolve entities 我已经实现了HtmlEntityReader-XmlReader的实现,该代码具有解析实体的代码
  • Public API of our system provides a methods with usage of XmlReader, so user can pass XmlReader created using one of the XmlReader.Create methods 我们系统的公共API提供了一种使用XmlReader的方法,因此用户可以传递使用XmlReader.Create方法之一创建的XmlReader。

Current code of my xml unit tests is below: 我的xml单元测试的当前代码如下:

using System.Xml;
using NUnit.Framework;

namespace Tests
{
    [TestFixture]
    public class XmlTests
    {
        // this test works
        [Test]
        public void TestEntitiesResolving1()
        {
            var path = QA.ResolvePath(@"html\bugs\317.html");
            using (var reader = new XmlTextReader(path, new NameTable()))
            {
                reader.XmlResolver = null; //to prevent DTD downloading
                var wrapper = new HtmlEntityReader(reader, XmlUtils.HtmlEntities);
                while (wrapper.Read()) { }
            }
        }

        // this test does not work - why?
        // what's the difference in initialization of internal XmlTextReaderImpl?
        [Test]
        public void TestEntitiesResolving2()
        {
            var path = QA.ResolvePath(@"html\bugs\317.html");
            var settings = new XmlReaderSettings
                           {
                               XmlResolver = null, //to prevent DTD downloading
                               NameTable = new NameTable(),
                               ProhibitDtd = false,
                               CheckCharacters = false,
                           };
            using (var reader = XmlReader.Create(path, settings))
            {
                var wrapper = new HtmlEntityReader(reader, XmlUtils.HtmlEntities);
                while (wrapper.Read()) { }
            }
        }
    }
}

Partial code of HtmlEntityReader is below: HtmlEntityReader的部分代码如下:

internal sealed class HtmlEntityReader : XmlReader
{
    readonly XmlReader _impl;
    readonly Hashtable _entitySet;
    string _entityValue;

    public HtmlEntityReader(XmlReader reader, Hashtable entitySet)
    {
        if (reader == null) throw new ArgumentNullException("reader");
        if (entitySet == null) throw new ArgumentNullException("entitySet");
        _impl = reader;
        _entitySet = entitySet;
    }

    public override XmlNodeType NodeType
    {
        get { return _entityValue != null ? XmlNodeType.Text : _impl.NodeType; }
    }

    public override string LocalName
    {
        get { return _entityValue != null ? string.Empty : _impl.LocalName; }
    }

    public override string Prefix
    {
        get { return _entityValue != null ? string.Empty : _impl.Prefix; }
    }

    public override string Name
    {
        get { return _entityValue != null ? string.Empty : _impl.Name; }
    }

    public override bool HasValue
    {
        get { return _entityValue != null || _impl.HasValue; }
    }

    public override string Value
    {
        get { return _entityValue ?? _impl.Value; }
    }

    public override bool CanResolveEntity
    {
        get { return true; }
    }

    public override void ResolveEntity()
    {
        //it seems this does not call - why?
    }

    public override bool Read()
    {
        _entityValue = null;
        if (!_impl.Read()) return false;
        if (NodeType == XmlNodeType.EntityReference)
        {
           //resolving of entity reference
           _entityValue = (string)_entitySet[Name];
        }
        return true;
    }

    // ... delegation of XmlReader abstract methods to _impl
}

I've got the exception: 我有一个例外:

System.Xml.XmlException: Reference to undeclared entity 'nbsp'. Line 4, position 5.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg, Int32 lineNo, Int32 linePos)
at System.Xml.XmlTextReaderImpl.HandleGeneralEntityReference(String name, Boolean isInAttributeValue, Boolean pushFakeEntityIfNullResolver, Int32 entityStartLinePos)
at System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, ref Int32 charRefEndPos)
at System.Xml.XmlTextReaderImpl.ParseText(ref Int32 startPos, ref Int32 endPos, ref Int32 outOrChars)
at System.Xml.XmlTextReaderImpl.ParseText()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
... private staff

Could you provide a quick advice or link to a solution while I am fixing / investigating / searching this issue through my own efforts? 通过我自己的努力解决/调查/搜索此问题时,您能否提供快速建议或链接到解决方案?

I've done some research on your question and as best I can tell the only way to ensure that character entities are resolved is to declare them in a DTD. 我已经对您的问题进行了一些研究,并且尽我所能告诉您,确保字符实体解析的唯一方法是在DTD中声明它们。 You can resolve the DTD contents yourself (eg for caching) by deriving an implementation from the Systm.Xml.XmlResolver base class and responding to GetEntity calls with a stream containing the DTD data. 通过从Systm.Xml.XmlResolver基类派生一个实现,并使用包含DTD数据的流响应GetEntity调用,可以自己解析DTD内容(例如,用于缓存)。

I wrote an article some time back that explains how to push a default DTD onto the XmlParserContext if there is no DTD declared in your input document. 一段时间以前 ,我写了一篇文章 ,解释了如果输入文档中未声明DTD的情况下如何将默认DTD推送到XmlParserContext。 This article is a little dated, but the same concept continues to work with XmlReaderSettings & XmlReader.Create by using an XmlReader.Create overload that accepts an XmlParserContext object as an argument. 本文有点陈旧,但是相同的概念仍然可以通过使用XmlReader.Create重载接受XmlParserContext对象作为参数的XmlReaderSettings&XmlReader.Create来使用。

Finally, it also looks like .NET 4 will help us out a little with a new XmlResolver derivative named XmlPreloadedResolver which seems to have the XHTML1 and RSS DTDs built in. 最后,.NET 4似乎还可以通过名为XmlPreloadedResolver的新XmlResolver派生工具帮助我们,该派生工具似乎内置了XHTML1和RSS DTD。

The funny thing is that, as sergeyt noted, XmlTextReader doesn't care about undefined entities when processing a xml fragment, while XmlReader does! 有趣的是,正如sergeyt指出的那样,当处理xml片段时,XmlTextReader并不关心未定义的实体,而XmlReader却关心!

So a solution in many cases would be to try with an XmlTextRader. 因此,在许多情况下,一种解决方案是尝试使用XmlTextRader。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM