[英]How to correctly parse an XML document with arbitrary namespaces
I am trying to parse somewhat standard XML documents that use a schema called MARCXML from various sources. 我试图解析一些标准的XML文档,这些文档使用来自各种来源的名为MARCXML的模式。
Here are the first few lines of an example XML file that needs to be handled... 以下是需要处理的示例XML文件的前几行...
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<marc:record>
<marc:leader>00925njm 22002777a 4500</marc:leader>
and one without namespace prefixes... 还有一个没有命名空间前缀......
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<leader>01142cam 2200301 a 4500</leader>
Key point: in order to get the XPaths to resolve further along in the program I have to go through a regex routine to add the namespaces to the NameTable (which doesn't add them by default). 关键点:为了让XPath在程序中进一步解析,我必须通过一个正则表达式例程将命名空间添加到NameTable(默认情况下不添加它们)。 This seems unnecessary to me.
这对我来说似乎没用。
Regex xmlNamespace = new Regex("xmlns:(?<PREFIX>[^=]+)=\"(?<URI>[^\"]+)\"", RegexOptions.Compiled);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlRecord);
XmlNamespaceManager nsMgr = new XmlNamespaceManager(xmlDoc.NameTable);
MatchCollection namespaces = xmlNamespace.Matches(xmlRecord);
foreach (Match n in namespaces)
{
nsMgr.AddNamespace(n.Groups["PREFIX"].ToString(), n.Groups["URI"].ToString());
}
The XPath call looks something like this... XPath调用看起来像这样......
XmlNode leaderNode = xmlDoc.SelectSingleNode(".//" + LeaderNode, nsMgr);
Where LeaderNode
is a configurable value and would equal "marc:leader"
in the first example and "leader"
in the second example. 凡
LeaderNode
是一个可配置的值,就等于"marc:leader"
在第一个例子和"leader"
在第二个例子。
Is there a better, more efficient way to do this? 有没有更好,更有效的方法来做到这一点? Note: suggestions for solving this using LINQ are welcome, but I would mainly like to know how to solve this using
XmlDocument
. 注意:欢迎使用LINQ解决此问题的建议,但我主要想知道如何使用
XmlDocument
解决此问题。
EDIT: I took GrayWizardx's advice and now have the following code... 编辑:我接受了GrayWizardx的建议,现在有以下代码......
if (LeaderNode.Contains(":"))
{
string prefix = LeaderNode.Substring(0, LeaderNode.IndexOf(':'));
XmlNode root = xmlDoc.FirstChild;
string nameSpace = root.GetNamespaceOfPrefix(prefix);
nsMgr.AddNamespace(prefix, nameSpace);
}
Now there's no more dependency on Regex! 现在不再依赖Regex了!
如果您知道文档中将存在给定元素(例如根元素),则可以尝试使用GetNamespaceOfPrefix 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.