繁体   English   中英

如何使用 python 从 xml 文件中删除根元素

[英]How to remove root element from xml file using python

我有一些 xml 个文件,格式是:

<objects>
   <object>
      <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
      </record>
   </object>
</objects>

这个 xml 中有两个根对象。我想使用删除其中一个。 我希望 xml 看起来像这样:

 <objects>
     <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
     </record>
 </objects>

我有一个装满这些文件的文件夹。 我想用 python 来做。有什么办法吗?

直接的方法如下图。 如果你的真实文件比一个对象/一个记录更复杂,你必须用例子更具体:

from xml.etree import ElementTree as et

xml = '''\
<objects>
   <object>
      <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
      </record>
   </object>
</objects>
'''

objects = et.fromstring(xml)
objects.append(objects[0][0]) # move "record" out of "object" and append as child to "objects"
objects.remove(objects[0])    # remove empty "object"
et.indent(objects)            # reformat indentation (Python 3.9+)
et.dump(objects)              # show result

Output:

<objects>
  <record>
    <invoice_source>EMAIL</invoice_source>
    <invoice_capture_date>2022-11-18</invoice_capture_date>
    <document_type>INVOICE</document_type>
    <data_capture_provider_code>00001</data_capture_provider_code>
    <data_capture_provider_reference>1264</data_capture_provider_reference>
    <document_capture_provide_code>00002</document_capture_provide_code>
    <document_capture_provider_ref>1264</document_capture_provider_ref>
    <rows />
  </record>
</objects>

处理object中任何嵌套内容的另一个选项:

objects = et.fromstring(xml)
objects = objects[0]     # extract "object" (lose "objects" layer)
objects.tag = 'objects'  # rename "object" tag
et.indent(objects)       # reformat indentation (Python 3.9+)
et.dump(objects)         # show result (same output)

我的方法是迭代<objects>的子节点,即<object> ,然后将<record>节点向上移动一级。 之后,我可以删除<object>节点。

import xml.etree.ElementTree as ET

doc = ET.parse("input.xml")
objects = doc.getroot()

for obj in objects:
    for record in obj:
        objects.append(record)
    objects.remove(obj)

doc.write("output.xml")

这是 output.xml 的内容:

<objects>
   <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows />
      </record>
   </objects>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM