简体   繁体   English

如何从XmlNode实例获取xpath

[英]How to get xpath from an XmlNode instance

Could someone supply some code that would get the xpath of a System.Xml.XmlNode instance? 有人可以提供一些代码来获取System.Xml.XmlNode实例的xpath吗?

Thanks! 谢谢!

Okay, I couldn't resist having a go at it. 好吧,我忍不住去了。 It'll only work for attributes and elements, but hey... what can you expect in 15 minutes :) Likewise there may very well be a cleaner way of doing it. 它只适用于属性和元素,但是嘿......你能在15分钟内得到什么:)同样可能有一种更清洁的方式。

It is superfluous to include the index on every element (particularly the root one!) but it's easier than trying to work out whether there's any ambiguity otherwise. 将索引包含在每个元素(特别是根元素!)上是多余的,但它比试图弄清楚是否存在任何歧义更容易。

using System;
using System.Text;
using System.Xml;

class Test
{
    static void Main()
    {
        string xml = @"
<root>
  <foo />
  <foo>
     <bar attr='value'/>
     <bar other='va' />
  </foo>
  <foo><bar /></foo>
</root>";
        XmlDocument doc = new XmlDocument();
        doc.LoadXml(xml);
        XmlNode node = doc.SelectSingleNode("//@attr");
        Console.WriteLine(FindXPath(node));
        Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node);
    }

    static string FindXPath(XmlNode node)
    {
        StringBuilder builder = new StringBuilder();
        while (node != null)
        {
            switch (node.NodeType)
            {
                case XmlNodeType.Attribute:
                    builder.Insert(0, "/@" + node.Name);
                    node = ((XmlAttribute) node).OwnerElement;
                    break;
                case XmlNodeType.Element:
                    int index = FindElementIndex((XmlElement) node);
                    builder.Insert(0, "/" + node.Name + "[" + index + "]");
                    node = node.ParentNode;
                    break;
                case XmlNodeType.Document:
                    return builder.ToString();
                default:
                    throw new ArgumentException("Only elements and attributes are supported");
            }
        }
        throw new ArgumentException("Node was not in a document");
    }

    static int FindElementIndex(XmlElement element)
    {
        XmlNode parentNode = element.ParentNode;
        if (parentNode is XmlDocument)
        {
            return 1;
        }
        XmlElement parent = (XmlElement) parentNode;
        int index = 1;
        foreach (XmlNode candidate in parent.ChildNodes)
        {
            if (candidate is XmlElement && candidate.Name == element.Name)
            {
                if (candidate == element)
                {
                    return index;
                }
                index++;
            }
        }
        throw new ArgumentException("Couldn't find element within parent");
    }
}

Jon's correct that there are any number of XPath expressions that will yield the same node in an an instance document. Jon是正确的,有任何数量的XPath表达式将在实例文档中产生相同的节点。 The simplest way to build an expression that unambiguously yields a specific node is a chain of node tests that use the node position in the predicate, eg: 构建明确产生特定节点的表达式的最简单方法是使用谓词中节点位置的节点测试链,例如:

/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]

Obviously, this expression isn't using element names, but then if all you're trying to do is locate a node within a document, you don't need its name. 显然,这个表达式不是使用元素名称,但是如果你要做的就是在文档中找到一个节点,那么你不需要它的名字。 It also can't be used to find attributes (because attributes aren't nodes and don't have position; you can only find them by name), but it will find all other node types. 它也不能用于查找属性(因为属性不是节点而没有位置;您只能通过名称找到它们),但它会找到所有其他节点类型。

To build this expression, you need to write a method that returns a node's position in its parent's child nodes, because XmlNode doesn't expose that as a property: 要构建此表达式,您需要编写一个返回节点在其父节点中的位置的方法,因为XmlNode不会将其作为属性公开:

static int GetNodePosition(XmlNode child)
{
   for (int i=0; i<child.ParentNode.ChildNodes.Count; i++)
   {
       if (child.ParentNode.ChildNodes[i] == child)
       {
          // tricksy XPath, not starting its positions at 0 like a normal language
          return i + 1;
       }
   }
   throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property.");
}

(There's probably a more elegant way to do that using LINQ, since XmlNodeList implements IEnumerable , but I'm going with what I know here.) (使用LINQ可能有一种更优雅的方法,因为XmlNodeList实现了IEnumerable ,但我会按照我所知道的去做。)

Then you can write a recursive method like this: 然后你可以编写一个这样的递归方法:

static string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have
        // to be matched by name, not found by position
        return String.Format(
            "{0}/@{1}",
            GetXPathToNode(((XmlAttribute)node).OwnerElement),
            node.Name
            );            
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }
    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings.
    return String.Format(
        "{0}/node()[{1}]",
        GetXPathToNode(node.ParentNode),
        GetNodePosition(node)
        );
}

As you can see, I hacked in a way for it to find attributes as well. 正如您所看到的,我在某种程度上也破解了它以找到属性。

Jon slipped in with his version while I was writing mine. 在我写作的时候,乔恩插入了他的版本。 There's something about his code that's going to make me rant a bit now, and I apologize in advance if it sounds like I'm ragging on Jon. 关于他的代码有一些东西会让我现在有点吵了,如果听起来我对Jon很讨厌,我会提前道歉。 (I'm not. I'm pretty sure that the list of things Jon has to learn from me is exceedingly short.) But I think the point I'm going to make is a pretty important one for anyone who works with XML to think about. (我不是。我很确定Jon必须向我学习的内容非常简短。)但我认为,对于任何使用XML的人来说,我要说的是非常重要的一点。想一想。

I suspect that Jon's solution emerged from something I see a lot of developers do: thinking of XML documents as trees of elements and attributes. 我怀疑Jon的解决方案来自我看到很多开发人员所做的事情:将XML文档视为元素和属性的树。 I think this largely comes from developers whose primary use of XML is as a serialization format, because all the XML they're used to using is structured this way. 我认为这主要来自于主要使用XML作为序列化格式的开发人员,因为他们习惯使用的所有XML都是以这种方式构建的。 You can spot these developers because they're using the terms "node" and "element" interchangeably. 您可以发现这些开发人员,因为他们可以互换地使用术语“节点”和“元素”。 This leads them to come up with solutions that treat all other node types as special cases. 这使他们想出了将所有其他节点类型视为特殊情况的解决方案。 (I was one of these guys myself for a very long time.) (很长一段时间,我自己就是其中一个人。)

This feels like it's a simplifying assumption while you're making it. 当你正在制作时,这感觉就像是一个简化的假设。 But it's not. 但事实并非如此。 It makes problems harder and code more complex. 它使问题更难,代码更复杂。 It leads you to bypass the pieces of XML technology (like the node() function in XPath) that are specifically designed to treat all node types generically. 它引导您绕过XML技术(如XPath中的node()函数),这些技术专门用于一般地处理所有节点类型。

There's a red flag in Jon's code that would make me query it in a code review even if I didn't know what the requirements are, and that's GetElementsByTagName . Jon的代码中有一个红旗,即使我不知道要求是什么,也会让我在代码审查中查询它,那就是GetElementsByTagName Whenever I see that method in use, the question that leaps to mind is always "why does it have to be an element?" 每当我看到使用该方法时,跳到脑海中的问题始终是“它为什么必须成为一个元素?” And the answer is very often "oh, does this code need to handle text nodes too?" 答案经常是“哦,这段代码是否也需要处理文本节点?”

I know, old post but the version I liked the most (the one with names) was flawed: When a parent node has nodes with different names, it stopped counting the index after it found the first non-matching node-name. 我知道,旧帖子,但我最喜欢的版本(名称有一个版本)存在缺陷:当父节点具有不同名称的节点时,它会在找到第一个不匹配的节点名称后停止计算索引。

Here is my fixed version of it: 这是我的固定版本:

/// <summary>
/// Gets the X-Path to a given Node
/// </summary>
/// <param name="node">The Node to get the X-Path from</param>
/// <returns>The X-Path of the Node</returns>
public string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }

    // Get the Index
    int indexInParent = 1;
    XmlNode siblingNode = node.PreviousSibling;
    // Loop thru all Siblings
    while (siblingNode != null)
    {
        // Increase the Index if the Sibling has the same Name
        if (siblingNode.Name == node.Name)
        {
            indexInParent++;
        }
        siblingNode = siblingNode.PreviousSibling;
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent);
}

Here's a simple method that I've used, worked for me. 这是我用过的一个简单的方法,为我工作。

    static string GetXpath(XmlNode node)
    {
        if (node.Name == "#document")
            return String.Empty;
        return GetXpath(node.SelectSingleNode("..")) + "/" +  (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name;
    }

My 10p worth is a hybrid of Robert and Corey's answers. 我的10p值是Robert和Corey的答案的混合体。 I can only claim credit for the actual typing of the extra lines of code. 我只能声称额外的代码行的实际输入。

    private static string GetXPathToNode(XmlNode node)
    {
        if (node.NodeType == XmlNodeType.Attribute)
        {
            // attributes have an OwnerElement, not a ParentNode; also they have
            // to be matched by name, not found by position
            return String.Format(
                "{0}/@{1}",
                GetXPathToNode(((XmlAttribute)node).OwnerElement),
                node.Name
                );
        }
        if (node.ParentNode == null)
        {
            // the only node with no parent is the root node, which has no path
            return "";
        }
        //get the index
        int iIndex = 1;
        XmlNode xnIndex = node;
        while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; }
        // the path to a node is the path to its parent, plus "/node()[n]", where 
        // n is its position among its siblings.
        return String.Format(
            "{0}/node()[{1}]",
            GetXPathToNode(node.ParentNode),
            iIndex
            );
    }

There's no such thing as "the" xpath of a node. 没有节点的“xpath”这样的东西。 For any given node there may well be many xpath expressions which will match it. 对于任何给定节点,可能有许多xpath表达式将匹配它。

You can probably work up the tree to build up an expression which will match it, taking into account the index of particular elements etc, but it's not going to be terribly nice code. 您可以使用树来构建一个与之匹配表达式,同时考虑特定元素的索引等,但它不会是非常好的代码。

Why do you need this? 你为什么需要这个? There may be a better solution. 可能有更好的解决方案。

If you do this, you will get a Path with Names of der Nodes AND the Position, if you have Nodes with the same name like this: "/Service[1]/System[1]/Group[1]/Folder[2]/File[2]" 如果你这样做,你会得到一个名为节点名称的路径和位置,如果你有这样的节点:“/ Service [1] / System [1] / Group [1] / Folder [2 ] /文件[2]”

public string GetXPathToNode(XmlNode node)
{         
    if (node.NodeType == XmlNodeType.Attribute)
    {             
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {             
        // the only node with no parent is the root node, which has no path
        return "";
    }

    //get the index
    int iIndex = 1;
    XmlNode xnIndex = node;
    while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name)
    {
         iIndex++;
         xnIndex = xnIndex.PreviousSibling; 
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where
    // n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex);
}

What about using class extension ? 使用类扩展怎么样? ;) My version (building on others work) uses the syntaxe name[index]... with index omited is element has no "brothers". ;)我的版本(建立在其他人的工作)使用语法名称[索引] ...索引省略是元素没有“兄弟”。 The loop to get the element index is outside in an independant routine (also a class extension). 获取元素索引的循环在独立例程(也是类扩展)之外。

Just past the following in any utility class (or in the main Program class) 在任何实用程序类(或主程序类)中超过以下内容

static public int GetRank( this XmlNode node )
{
    // return 0 if unique, else return position 1...n in siblings with same name
    try
    {
        if( node is XmlElement ) 
        {
            int rank = 1;
            bool alone = true, found = false;

            foreach( XmlNode n in node.ParentNode.ChildNodes )
                if( n.Name == node.Name ) // sibling with same name
                {
                    if( n.Equals(node) )
                    {
                        if( ! alone ) return rank; // no need to continue
                        found = true;
                    }
                    else
                    {
                        if( found ) return rank; // no need to continue
                        alone = false;
                        rank++;
                    }
                }

        }
    }
    catch{}
    return 0;
}

static public string GetXPath( this XmlNode node )
{
    try
    {
        if( node is XmlAttribute )
            return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name );

        if( node is XmlText || node is XmlCDataSection )
            return node.ParentNode.GetXPath();

        if( node.ParentNode == null )   // the only node with no parent is the root node, which has no path
            return "";

        int rank = node.GetRank();
        if( rank == 0 ) return String.Format( "{0}/{1}",        node.ParentNode.GetXPath(), node.Name );
        else            return String.Format( "{0}/{1}[{2}]",   node.ParentNode.GetXPath(), node.Name, rank );
    }
    catch{}
    return "";
}   

I produced VBA for Excel to do this for a work project. 我为Excel工作项目制作了VBA for Excel。 It outputs tuples of an Xpath and the associated text from an elemen or attribute. 它输出Xpath的元组和元素或属性的相关文本。 The purpose was to allow business analysts to identify and map some xml. 目的是允许业务分析人员识别和映射一些xml。 Appreciate that this is a C# forum, but thought this may be of interest. 感谢这是一个C#论坛,但认为这可能是有意义的。

Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes)


Dim chnode As IXMLDOMNode
Dim attr As IXMLDOMAttribute
Dim oXString As String
Dim chld As Long
Dim idx As Variant
Dim addindex As Boolean
chld = 0
idx = 0
addindex = False


'determine the node type:
Select Case inode.NodeType

    Case NODE_ELEMENT
        If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes
            oXString = iXstring & "//" & fp(inode.nodename)
        Else

            'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules

            For Each chnode In inode.ParentNode.ChildNodes
                If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1
            Next chnode

            If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed
                'Lookup the index from the indexes array
                idx = getIndex(inode.nodename, indexes)
                addindex = True
            Else
            End If

            'build the XString
            oXString = iXstring & "/" & fp(inode.nodename)
            If addindex Then oXString = oXString & "[" & idx & "]"

            'If type is element then check for attributes
            For Each attr In inode.Attributes
                'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value
                Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value)
            Next attr

        End If

    Case NODE_TEXT
        'build the XString
        oXString = iXstring
        Call oSheet(oSh, oXString, inode.NodeValue)

    Case NODE_ATTRIBUTE
    'Do nothing
    Case NODE_CDATA_SECTION
    'Do nothing
    Case NODE_COMMENT
    'Do nothing
    Case NODE_DOCUMENT
    'Do nothing
    Case NODE_DOCUMENT_FRAGMENT
    'Do nothing
    Case NODE_DOCUMENT_TYPE
    'Do nothing
    Case NODE_ENTITY
    'Do nothing
    Case NODE_ENTITY_REFERENCE
    'Do nothing
    Case NODE_INVALID
    'do nothing
    Case NODE_NOTATION
    'do nothing
    Case NODE_PROCESSING_INSTRUCTION
    'do nothing
End Select

'Now call Parser2 on each of inode's children.
If inode.HasChildNodes Then
    For Each chnode In inode.ChildNodes
        Call Parse2(oSh, chnode, oXString, indexes)
    Next chnode
Set chnode = Nothing
Else
End If

End Sub

Manages the counting of elements using: 使用以下方法管理元素计数:

Function getIndex(tag As Variant, indexes) As Variant
'Function to get the latest index for an xml tag from the indexes array
'indexes array is passed from one parser function to the next up and down the tree

Dim i As Integer
Dim n As Integer

If IsArrayEmpty(indexes) Then
    ReDim indexes(1, 0)
    indexes(0, 0) = "Tag"
    indexes(1, 0) = "Index"
Else
End If
For i = 0 To UBound(indexes, 2)
    If indexes(0, i) = tag Then
        'tag found, increment and return the index then exit
        'also destroy all recorded tag names BELOW that level
        indexes(1, i) = indexes(1, i) + 1
        getIndex = indexes(1, i)
        ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it
        Exit Function
    Else
    End If
Next i

'tag not found so add the tag with index 1 at the end of the array
n = UBound(indexes, 2)
ReDim Preserve indexes(1, n + 1)
indexes(0, n + 1) = tag
indexes(1, n + 1) = 1
getIndex = 1

End Function

Another solution to your problem might be to 'mark' the xmlnodes which you will want to later identify with a custom attribute: 您问题的另一个解决方案可能是“标记”您希望稍后使用自定义属性识别的xmlnodes:

var id = _currentNode.OwnerDocument.CreateAttribute("some_id");
id.Value = Guid.NewGuid().ToString();
_currentNode.Attributes.Append(id);

which you can store in a Dictionary for example. 例如,你可以存储在字典中。 And you can later identify the node with an xpath query: 然后您可以使用xpath查询识别节点:

newOrOldDocument.SelectSingleNode(string.Format("//*[contains(@some_id,'{0}')]", id));

I know this is not a direct answer to your question, but it can help if the reason you wish to know the xpath of a node is to have a way of 'reaching' the node later after you have lost the reference to it in code. 我知道这不是你问题的直接答案,但是如果你想知道节点的xpath的原因是在你在代码中丢失对它的引用之后有一种“到达”节点的方法,它会有所帮助。

This also overcomes problems when the document gets elements added/moved, which can mess up the xpath (or indexes, as suggested in other answers). 这也克服了文档添加/移动元素时的问题,这可能会弄乱xpath(或其他答案中建议的索引)。

I found that none of the above worked with XDocument , so I wrote my own code to support XDocument and used recursion. 我发现以上都没有使用XDocument ,所以我编写了自己的代码来支持XDocument并使用了递归。 I think this code handles multiple identical nodes better than some of the other code here because it first tries to go as deep in to the XML path as it can and then backs up to build only what is needed. 我认为这个代码比其他代码更好地处理多个相同的节点,因为它首先尝试深入到XML路径,然后备份以仅构建所需的代码。 So if you have /home/white/bob and /home/white/mike and you want to create /home/white/bob/garage the code will know how to create that. 因此,如果你有/home/white/bob/home/white/mike并且想要创建/home/white/bob/garage ,代码将知道如何创建它。 However, I didn't want to mess with predicates or wildcards, so I explicitly disallowed those; 但是,我不想搞乱谓词或通配符,所以我明确禁止那些; but it would be easy to add support for them. 但是添加对它们的支持会很容易。

Private Sub NodeItterate(XDoc As XElement, XPath As String)
    'get the deepest path
    Dim nodes As IEnumerable(Of XElement)

    nodes = XDoc.XPathSelectElements(XPath)

    'if it doesn't exist, try the next shallow path
    If nodes.Count = 0 Then
        NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/")))
        'by this time all the required parent elements will have been constructed
        Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/"))
        Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath)
        Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1)
        ParentNode.Add(New XElement(NewElementName))
    End If

    'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed
    If nodes.Count > 1 Then
        Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.")
    End If

    'if there is just one element, we can proceed
    If nodes.Count = 1 Then
        'just proceed
    End If

End Sub

Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String)

    If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then
        Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.")
    End If

    If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then
        Throw New ArgumentException("Can't create a path based on predicates.")
    End If

    'we will process this recursively.
    NodeItterate(XDoc, XPath)

End Sub

This is even easier 这更容易

 ''' <summary>
    ''' Gets the full XPath of a single node.
    ''' </summary>
    ''' <param name="node"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Private Function GetXPath(ByVal node As Xml.XmlNode) As String
        Dim temp As String
        Dim sibling As Xml.XmlNode
        Dim previousSiblings As Integer = 1

        'I dont want to know that it was a generic document
        If node.Name = "#document" Then Return ""

        'Prime it
        sibling = node.PreviousSibling
        'Perculate up getting the count of all of this node's sibling before it.
        While sibling IsNot Nothing
            'Only count if the sibling has the same name as this node
            If sibling.Name = node.Name Then
                previousSiblings += 1
            End If
            sibling = sibling.PreviousSibling
        End While

        'Mark this node's index, if it has one
        ' Also mark the index to 1 or the default if it does have a sibling just no previous.
        temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString()

        If node.ParentNode IsNot Nothing Then
            Return GetXPath(node.ParentNode) + "/" + temp
        End If

        Return temp
    End Function
 public static string GetFullPath(this XmlNode node)
        {
            if (node.ParentNode == null)
            {
                return "";
            }
            else
            {
                return $"{GetFullPath(node.ParentNode)}\\{node.ParentNode.Name}";
            }
        }

I had to do this recently. 我最近不得不这样做。 Only elements needed to be considered. 只需要考虑因素。 This is what I came up with: 这就是我想出的:

    private string GetPath(XmlElement el)
    {
        List<string> pathList = new List<string>();
        XmlNode node = el;
        while (node is XmlElement)
        {
            pathList.Add(node.Name);
            node = node.ParentNode;
        }
        pathList.Reverse();
        string[] nodeNames = pathList.ToArray();
        return String.Join("/", nodeNames);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM