[英]How to iterate over child of child elements in XML Python?
我有一個 XML 結構如下:
<pages>
<page>
<textbox>
<new_line>
<text>
</text>
</new_line>
</textbox>
</page>
</pages>
我正在迭代作為new_line
元素的子元素的text
元素,以加入具有相同size
屬性的標簽。 但我想指定new_line
元素必須在textbox
元素內。 我嘗試在我的代碼中添加一個 for 循環,但它根本不起作用。 這是代碼:
import lxml.etree as etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('output22.xml', parser)
root = tree.getroot()
# Iterate over //newline block
for new_line_block in tree.xpath('//new_line'):
# Find all "text" element in the new_line block
list_text_elts = new_line_block.findall('text')
# Iterate over all of them with the current and previous ones
for previous_text, current_text in zip(list_text_elts[:-1], list_text_elts[1:]):
# Get size elements
prev_size = previous_text.attrib.get('size')
curr_size = current_text.attrib.get('size')
# If they are equals and not both null
if curr_size == prev_size and curr_size is not None:
# Get current and previous text
pt = previous_text.text if previous_text.text is not None else ""
ct = current_text.text if current_text.text is not None else ""
# Add them to current element
current_text.text = pt + ct
# Remove preivous element
previous_text.getparent().remove(previous_text)
newtree = etree.tostring(root, encoding='utf-8', pretty_print=True)
#newtree = newtree.decode("utf-8")
print(newtree)
with open("output2.xml", "wb") as f:
f.write(newtree)
編輯:
示例字符串:
"""<?xml version="1.0" encoding="utf-8"?>
<pages>
<page>
<textbox>
<new_line>
<text size="12.482">C</text>
<text size="12.333">A</text>
<text size="12.333">P</text>
<text size="12.333">I</text>
<text size="12.482">T</text>
<text size="12.482">O</text>
<text size="12.482">L</text>
<text size="12.482">O</text>
<text></text>
<text size="12.482">I</text>
<text size="12.482">I</text>
<text size="12.482">I</text>
<text></text>
</new_line>
</textbox>
</page>
</pages>
"""
預期 output:
<pages>
<page>
<textbox>
<new_line>
<text size="12.482">C</text>
<text size="12.333">API</text>
<text size="12.482">TOLO</text>
<text/>
<text size="12.482">III</text>
<text/>
</new_line>
</textbox>
</page>
</pages>
您可以定義遞歸 function 來解決您的情況下的多層 XML。 我為這個問題寫了一個簡碼。
import sys
import xml.etree.ElementTree as etree
def add_sub_element(parent, tag, attrib, text='None'):
new_feed = etree.SubElement(parent, tag, attrib)
if(text):
new_feed.text = text
return new_feed
def my_tree_mapper(parent_tag, current, element):
if(current.tag == 'new_line' and parent_tag == 'textbox'):
current_size = -1
current_text = ""
for child in element:
child_tag = child.tag
child_attrib = child.attrib
child_text = child.text
if(child_tag == 'text' and 'size' in child_attrib):
if(child_attrib['size'] == current_size):
# For 'text' children with the same size
# Append text until we got a different size
current_text = current_text + child_text
else:
if(current_size != -1):
# Add sub element into the tree when we got a different size
sub_element = add_sub_element(
current, child_tag, {'size': current_size}, current_text)
current_size = child_attrib['size']
current_text = child_text
else:
if(current_size != -1):
# Or add sub element into the tree when we got different tag
sub_element = add_sub_element(
current, child_tag, {'size': current_size}, current_text)
# No logic for different tag
sub_element = add_sub_element(
current, child_tag, child_attrib, child_text)
my_tree_mapper(current.tag, sub_element, child)
current_size = -1
current_text = ""
else:
# No logic if not satisfy the condition
for child in element:
child_tag = child.tag
child_attrib = child.attrib
child_text = child.text
sub_element = add_sub_element(
current, child_tag, child_attrib, child_text)
my_tree_mapper(current.tag, sub_element, child)
the_input = """<?xml version="1.0" encoding="utf-8"?>
<pages>
<page>
<textbox>
<new_line>
<text size="12.482">C</text>
<text size="12.333">A</text>
<text size="12.333">P</text>
<text size="12.333">I</text>
<text size="12.482">T</text>
<text size="12.482">O</text>
<text size="12.482">L</text>
<text size="12.482">O</text>
<text></text>
<text size="12.482">I</text>
<text size="12.482">I</text>
<text size="12.482">I</text>
<text></text>
</new_line>
</textbox>
</page>
</pages>
"""
tree = etree.ElementTree(etree.fromstring(the_input))
root = tree.getroot()
new_root = etree.Element(root.tag, root.attrib)
my_tree_mapper('', new_root, root)
print(etree.tostring(new_root))
希望這可以幫助你,或者至少給你一些想法。
(如果您想閱讀更多關於 Incursive Functions文檔和示例的信息。更多關於 XML etree 方法的信息請點擊此處)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.