簡體   English   中英

如何使用 python 從 xml 文件中刪除根元素

[英]How to remove root element from xml file using python

我有一些 xml 個文件,格式是:

<objects>
   <object>
      <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
      </record>
   </object>
</objects>

這個 xml 中有兩個根對象。我想使用刪除其中一個。 我希望 xml 看起來像這樣:

 <objects>
     <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
     </record>
 </objects>

我有一個裝滿這些文件的文件夾。 我想用 python 來做。有什么辦法嗎?

直接的方法如下圖。 如果你的真實文件比一個對象/一個記錄更復雜,你必須用例子更具體:

from xml.etree import ElementTree as et

xml = '''\
<objects>
   <object>
      <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows/>
      </record>
   </object>
</objects>
'''

objects = et.fromstring(xml)
objects.append(objects[0][0]) # move "record" out of "object" and append as child to "objects"
objects.remove(objects[0])    # remove empty "object"
et.indent(objects)            # reformat indentation (Python 3.9+)
et.dump(objects)              # show result

Output:

<objects>
  <record>
    <invoice_source>EMAIL</invoice_source>
    <invoice_capture_date>2022-11-18</invoice_capture_date>
    <document_type>INVOICE</document_type>
    <data_capture_provider_code>00001</data_capture_provider_code>
    <data_capture_provider_reference>1264</data_capture_provider_reference>
    <document_capture_provide_code>00002</document_capture_provide_code>
    <document_capture_provider_ref>1264</document_capture_provider_ref>
    <rows />
  </record>
</objects>

處理object中任何嵌套內容的另一個選項:

objects = et.fromstring(xml)
objects = objects[0]     # extract "object" (lose "objects" layer)
objects.tag = 'objects'  # rename "object" tag
et.indent(objects)       # reformat indentation (Python 3.9+)
et.dump(objects)         # show result (same output)

我的方法是迭代<objects>的子節點,即<object> ,然后將<record>節點向上移動一級。 之后,我可以刪除<object>節點。

import xml.etree.ElementTree as ET

doc = ET.parse("input.xml")
objects = doc.getroot()

for obj in objects:
    for record in obj:
        objects.append(record)
    objects.remove(obj)

doc.write("output.xml")

這是 output.xml 的內容:

<objects>
   <record>
         <invoice_source>EMAIL</invoice_source>
         <invoice_capture_date>2022-11-18</invoice_capture_date>
         <document_type>INVOICE</document_type>
         <data_capture_provider_code>00001</data_capture_provider_code>
         <data_capture_provider_reference>1264</data_capture_provider_reference>
         <document_capture_provide_code>00002</document_capture_provide_code>
         <document_capture_provider_ref>1264</document_capture_provider_ref>
         <rows />
      </record>
   </objects>

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM