在 Python 中使用 elementTree 搜索和刪除元素

Question

我有一個 XML 文檔，我想在其中搜索一些元素，如果它們符合某些條件，我想刪除它們

但是，我似乎無法訪問元素的父元素，以便刪除它

file = open('test.xml', "r")
elem = ElementTree.parse(file)

namespace = "{http://somens}"

props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
    type = prop.attrib.get('type', None)
    if type == 'json':
        value = json.loads(prop.attrib['value'])
        if value['name'] == 'Page1.Button1':
            #here I need to access the parent of prop
            # in order to delete the prop

有沒有辦法我可以做到這一點？

謝謝

Answer 1

您可以使用相應的remove方法刪除子元素。 要刪除一個元素，您必須調用其父級的remove方法。 不幸的是Element沒有提供對它的父母的引用，所以由你來跟蹤父母/孩子的關系（這反對你使用elem.findall() ）

建議的解決方案可能如下所示：

root = elem.getroot()
for child in root:
    if child.name != "prop":
        continue
    if True:# TODO: do your check here!
        root.remove(child)

PS：不要使用prop.attrib.get() ，使用prop.get() ，如此處所述。

Answer 2

您可以使用 xpath 來選擇元素的父級。

file = open('test.xml', "r")
elem = ElementTree.parse(file)

namespace = "{http://somens}"

props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
    type = prop.get('type', None)
    if type == 'json':
        value = json.loads(prop.attrib['value'])
        if value['name'] == 'Page1.Button1':
            # Get parent and remove this prop
            parent = prop.find("..")
            parent.remove(prop)

http://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax

除非您嘗試它不起作用： http ://elmpowered.skawaii.net/?p=74

因此，您必須：

file = open('test.xml', "r")
elem = ElementTree.parse(file)

namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)

# Use xpath to get all parents of props    
prop_parents = elem.findall(search + '/..')
for parent in prop_parents:
    # Still have to find and iterate through child props
    for prop in parent.findall(search):
        type = prop.get('type', None)
        if type == 'json':
            value = json.loads(prop.attrib['value'])
            if value['name'] == 'Page1.Button1':
                parent.remove(prop)

它是兩個搜索和一個嵌套循環。 內部搜索僅針對已知包含作為第一個子項的道具的元素，但這可能意義不大，具體取決於您的架構。

Answer 3

我知道這是一個舊線程，但是當我試圖找出類似的任務時，它不斷彈出。 我不喜歡接受的答案有兩個原因：

1）它不處理多個嵌套級別的標簽。

2）如果多個xml標簽在同一級別一個接一個地被刪除，它將中斷。 由於每個元素都是Element._children的索引，因此在向前迭代時不應刪除。

我認為一個更好更通用的解決方案是：

import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
root = tree.getroot()

def iterator(parents, nested=False):
    for child in reversed(parents):
        if nested:
            if len(child) >= 1:
                iterator(child)
        if True:  # Add your entire condition here
            parents.remove(child)

iterator(root, nested=True)

對於 OP，這應該可行 - 但我沒有您正在使用的數據來測試它是否完美。

import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)

namespace = "{http://somens}"
props = tree.findall('.//{0}prop'.format(namespace))

def iterator(parents, nested=False):
    for child in reversed(parents):
        if nested:
            if len(child) >= 1:
                iterator(child)
        if prop.attrib.get('type') == 'json':
            value = json.loads(prop.attrib['value'])
            if value['name'] == 'Page1.Button1':
                parents.remove(child)

iterator(props, nested=True)

Answer 4

利用每個孩子都必須有父母這一事實，我將簡化@kitsu.eb 的示例。 f 使用 findall 命令獲取孩子和父母，他們的索引將是等價的。

    file = open('test.xml', "r")
    elem = ElementTree.parse(file)

    namespace = "{http://somens}"
    search = './/{0}prop'.format(namespace)

    # Use xpath to get all parents of props    
    prop_parents = elem.findall(search + '/..')

    props = elem.findall('.//{0}prop'.format(namespace))
    for prop in props:
            type = prop.attrib.get('type', None)
            if type == 'json':
                value = json.loads(prop.attrib['value'])
                if value['name'] == 'Page1.Button1':
                    #use the index of the current child to find
                    #its parent and remove the child
                    prop_parents[props.index[prop]].remove(prop)

Answer 5

使用 lxml 模塊的解決方案

from lxml import etree

root = ET.fromstring(xml_str)
for e in root.findall('.//{http://some.name.space}node'):
parent = e.getparent()
for child in parent.find('./{http://some.name.space}node'):
    try:
        parent.remove(child)
    except ValueError:
        pass

Answer 6

我喜歡使用 XPath 表達式進行這種過濾。 除非我另外知道，否則必須在根級別應用這樣的表達式，這意味着我不能只獲取父級並在該父級上應用相同的表達式。 但是，在我看來，只要沒有一個尋找的節點是根節點，就有一個很好且靈活的解決方案可以與任何受支持的 XPath 一起使用。 它是這樣的：

root = elem.getroot()
# Find all nodes matching the filter string (flt)
nodes = root.findall(flt)
while len(nodes):
    # As long as there are nodes, there should be parents
    # Get the first of all parents to the found nodes
    parent = root.findall(flt+'/..')[0]
    # Use this parent to remove the first node
    parent.remove(nodes[0])
    # Find all remaining nodes
    nodes = root.findall(flt)

Answer 7

我也為這個問題使用了 XPath，但方式不同：

root = elem.getroot()    
elementName = "YourElement"
#this will find all the parents of the elements with elementName
for elementParent in root.findall(".//{}/..".format(elementName)):
   #this will find all the elements under the parent, and remove them
   for element in elementParent.findall("{}".format(elementName)):
      elementParent.remove(element)

Answer 8

我只想對已接受的答案添加評論，但我缺乏聲譽不允許我這樣做。 我想補充一點，將.findall("*")添加到迭代器以避免出現問題很重要，如文檔中所述：

請注意，迭代時的並發修改可能會導致問題，就像迭代和修改 Python 列表或字典時一樣。 因此，該示例首先使用 root.findall() 收集所有匹配元素，然后才遍歷匹配列表。

因此，在接受的答案中，迭代應該是for child in root.findal("*"):而不是for child in root: 。 不這樣做會使我的代碼跳過列表中的一些元素。

在 Python 中使用 elementTree 搜索和刪除元素

問題描述

8 個解決方案

解決方案1
35 已采納 2011-07-27 16:07:51

解決方案2
6 2013-06-01 00:07:19

解決方案3
5 2017-08-23 01:18:08

解決方案4
2 2016-08-13 14:50:45

解決方案5
2 2018-04-03 13:41:28

解決方案6
1 2018-02-06 07:43:35

解決方案7
1 2021-08-23 13:16:09

解決方案8
0 2022-06-01 14:32:39

在 Python 中使用 elementTree 搜索和刪除元素

問題描述

8 個解決方案

解決方案1 35 已采納 2011-07-27 16:07:51

解決方案2 6 2013-06-01 00:07:19

解決方案3 5 2017-08-23 01:18:08

解決方案4 2 2016-08-13 14:50:45

解決方案5 2 2018-04-03 13:41:28

解決方案6 1 2018-02-06 07:43:35

解決方案7 1 2021-08-23 13:16:09

解決方案8 0 2022-06-01 14:32:39

解決方案1
35 已采納 2011-07-27 16:07:51

解決方案2
6 2013-06-01 00:07:19

解決方案3
5 2017-08-23 01:18:08

解決方案4
2 2016-08-13 14:50:45

解決方案5
2 2018-04-03 13:41:28

解決方案6
1 2018-02-06 07:43:35

解決方案7
1 2021-08-23 13:16:09

解決方案8
0 2022-06-01 14:32:39