简体   繁体   English

获取 XML 中的注释节点

[英]Get commented nodes in XML

Env: Python 3.9.7, Windows 10环境:Python 3.9.7,Windows 10

How can I get XPATHs of the commented nodes?如何获取注释节点的 XPATH?


Example XML (ex.xml)示例 XML(ex.xml)

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E">AUS</neighbor>
        <!-- A1 -->
        <neighbor name="Switzerland" direction="W">SWI</neighbor>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <!-- B1 -->
        <neighbor name="Malaysia" direction="N"/>
    </country>
</data>

What I expect我的期望

import xml.etree.ElementTree as et

def parse_commented_nodes(root):
    """
    Returns something like
    {
        "A1" : "./country[@name='Liechtenstein']/neighbor[@name='Austria']",
        "B1" : "./country[@nmae='Singapore']/gdppc"
    }
    """
    return {}

tree = et.parse("ex.xml")
root = tree.getroot()
res = parse_commented_nodes(root)

My idea我的想法

  1. Read the file as a text.将文件作为文本读取。
  2. Find the lines which comes before a comment.找到评论之前的行。
  3. Get parents iteratively from the nodes up to the root.从节点到根迭代地获取父节点。

But I have a problem 'getting parents' from the above method.但是我从上述方法中“得到父母”时遇到了问题。 For example,例如,

annotated_node = root.find(".//neighbor[@name='Austria']")
print(annotated_node.find("..")) # None
print(annotated_node.find("./..")) # None

I have searched ways to get parents (or get full XPATH) of a node using the default xml module of Python but couldn't find an effective one.我已经搜索了使用 Python 的默认xml模块获取节点的父节点(或获取完整 XPATH)的方法,但找不到有效的方法。


How to read commented text from XML file in python 如何从 python 中的 XML 文件中读取注释文本

My question is similar with the above but not a duplicate.我的问题与上述类似,但不重复。 It finds 'comments' but I need 'nodes before comments.'它找到“评论”,但我需要“评论前的节点”。

Problem solved by using lxml as @mzjn suggested.使用@mzjn建议的lxml解决了问题。

from lxml import etree as et

def parse_commented_nodes(tree):
    res = {}
    for node in tree.iter():
        if "function Comment" in str(node.tag):
            res[node.text] = tree.getpath(node.getprevious())
    return res

tree = et.parse("ex.xml")
res = parse_commented_nodes(tree)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM