简体   繁体   English

从 xml 文件中获取节点

[英]Get nodes from xml files

How to parse the xml file?如何解析xml文件?

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
<sitemap> 
    <loc>link</loc>
    <lastmod>2011-08-17T08:23:17+00:00</lastmod> 
</sitemap> 
<sitemap>
    <loc>link</loc> 
    <lastmod>2011-08-18T08:23:17+00:00</lastmod> 
</sitemap> 
</sitemapindex>

I am new to XML, I tried this, but it seems to be not working :我是 XML 的新手,我试过这个,但它似乎不起作用:

        XmlDocument xml = new XmlDocument(); //* create an xml document object. 
        xml.Load("sitemap.xml");
        XmlNodeList xnList = xml.SelectNodes("/sitemapindex/sitemap");
        foreach (XmlNode xn in xnList)
        {
            String loc= xn["loc"].InnerText;
            String lastmod= xn["lastmod"].InnerText;
        }

The problem is that the sitemapindex element defines a default namespace.问题在于sitemapindex元素定义了一个默认命名空间。 You need to specify the namespace when you select the nodes, otherwise it will not find them.选择节点时需要指定命名空间,否则将无法找到它们。 For instance:例如:

XmlDocument xml = new XmlDocument();
xml.Load("sitemap.xml");
XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable);
manager.AddNamespace("s", "http://www.sitemaps.org/schemas/sitemap/0.9");
XmlNodeList xnList = xml.SelectNodes("/s:sitemapindex/s:sitemap", manager);

Normally speaking, when using the XmlNameSpaceManager , you could leave the prefix as an empty string to specify that you want that namespace to be the default namespace.通常来说,在使用XmlNameSpaceManager ,您可以将前缀保留为空字符串,以指定您希望该命名空间成为默认命名空间。 So you would think you'd be able to do something like this:所以你会认为你可以做这样的事情:

// WON'T WORK
XmlDocument xml = new XmlDocument();
xml.Load("sitemap.xml");
XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable);
manager.AddNamespace("", "http://www.sitemaps.org/schemas/sitemap/0.9"); //Empty prefix
XmlNodeList xnList = xml.SelectNodes("/sitemapindex/sitemap", manager); //No prefixes in XPath

However, if you try that code, you'll find that it won't find any matching nodes.但是,如果您尝试该代码,您会发现它找不到任何匹配的节点。 The reason for this is that in XPath 1.0 (which is what XmlDocument implements), when no namespace is provided, it always uses the null namespace, not the default namespace.这样做的原因是在 XPath 1.0(这是 XmlDocument 实现的)中,当没有提供命名空间时,它总是使用空命名空间,而不是默认命名空间。 So, it doesn't matter if you specify a default namespace in the XmlNamespaceManager , it's not going to be used by XPath, anyway.因此,如果您在XmlNamespaceManager指定默认命名空间并不重要,无论如何 XPath 都不会使用它。 To quote the relevant paragraph from the Official XPath Specification :引用官方 XPath 规范中的相关段落:

A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context.使用表达式上下文中的命名空间声明将节点测试中的 QName 扩展为扩展名称。 This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded).这与对开始和结束标签中的元素类型名称进行扩展的方式相同,只是不使用用 xmlns 声明的默认名称空间:如果 QName 没有前缀,则名称空间 URI 为空(这是相同的方式属性名称扩展)。 It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.如果 QName 具有在表达式上下文中没有命名空间声明的前缀,则会出现错误。

Therefore, when the elements you are reading belong to a namespace, you can't avoid putting the namespace prefix in your XPath statements.因此,当您正在读取的元素属于一个名称空间时,您无法避免将名称空间前缀放在您的 XPath 语句中。 However, if you don't want to bother putting the namespace URI in your code, you can just use the XmlDocument object to return the URI of the root element, which in this case, is what you want.但是,如果您不想费心将命名空间 URI 放在您的代码中,您可以只使用XmlDocument对象来返回根元素的 URI,在这种情况下,这就是您想要的。 For instance:例如:

XmlDocument xml = new XmlDocument();
xml.Load("sitemap.xml");
XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable);
manager.AddNamespace("s", xml.DocumentElement.NamespaceURI); //Using xml's properties instead of hard-coded URI
XmlNodeList xnList = xml.SelectNodes("/s:sitemapindex/s:sitemap", manager);

Sitemap has 2 sub nodes "loc" and "lastmod".站点地图有 2 个子节点“loc”和“lastmod”。 The nodes that you are accessing are "name" and "url".您正在访问的节点是“名称”和“网址”。 that is why you are not getting any result.这就是为什么你没有得到任何结果。 Also in your XML file the last sitemap tag is not closed properly with a corresponding Kindly try xn["loc"].InnerText and see if you get the desired result.同样在您的 XML 文件中,最后一个站点地图标记没有正确关闭,请尝试使用 xn["loc"].InnerText 并查看您是否得到所需的结果。

I would definitely use LINQ to XML instead of the older XmlDocument based XML API.我肯定会使用 LINQ to XML 而不是旧的基于 XmlDocument 的 XML API。 You can accomplish what you are looking to do using the following code.您可以使用以下代码完成您要执行的操作。 Notice, I changed the name of the element that I am trying to get the value of to 'loc' and 'lastmod', because this is what is in your sample XML ('name' and 'url' did not exist):请注意,我将要获取其值的元素的名称更改为“loc”和“lastmod”,因为这是您的示例 XML 中的内容(“name”和“url”不存在):

XElement element = XElement.Parse(XMLFILE);
        IEnumerable<XElement> list = element.Elements("sitemap");
        foreach (XElement e in list)
        {
            String LOC= e.Element("loc").Value;
            String LASTMOD = e.Element("lastmod").Value;
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM