简体   繁体   English

在MSXML中使用nbsp解析HTML块

[英]Parse HTML chunk with nbsp in MSXML

I'm trying to load a chunk of HTML into MSXML's DOMDocument. 我正在尝试将HTML的一部分加载到MSXML的DOMDocument中。 The said chunk is valid XML with one excepton - it has   所说的块是带有一个例外的有效XML-它具有  entities. 实体。 MSXML chokes on them, claims "Reference to undefined entity 'nbsp'.". MSXML对此感到cho之以鼻,声称“对未定义实体'nbsp'的引用”。

Can I make MSXML recognize it as valid somehow? 我可以使MSXML以某种方式将其识别为有效吗?

Simple solution: Just run a text replacement of " " 简单的解决方案:只需对“ ”进行文本替换 to " " before parsing the document. 解析文档之前,请先将“ Which should work, since there cannot be a verbatim   哪个应该起作用,因为不能有逐字记录  in the text, which should not be replaced. 在文本中,不应替换。

More standard solution: Declare a nbsp; 更标准的解决方案:声明 entity in the xml, by inserting 通过插入xml中的实体

<!DOCTYPE foobar [
   <!ENTITY nbsp " " >
]>

before the xml root node. xml根节点之前。

You can also use "0xA0" and &#x00A0; 您还可以使用“ 0xA0”和&#x00A0; if you actually want a non-breaking space, instead of a normal space 如果您实际上想要一个不间断的空间,而不是一个正常的空间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM