简体   繁体   中英

Python: Most efficient way to update an XML file in lxml?

I have a very large XML file containing data about network devices. Every time the program iterates, it will modify one network_device entry while leaving the rest of the file alone. I am trying to find the most efficient way to:

  1. Add a new network_device element and append it to the existing XML file without entirely rewriting it (since I assume that is far more resource intensive)
  2. Change an existing network_device element, also while remaining as resource friendly as possible.

Every example I've seen so far loads the entire XML file into memory in an ElementTree object, edits the tree, and the writes the tree to a file. At up to a few hundred megabytes per file, this is a very intensive process.

I am using the lxml library to do this, but I am not stuck to that idea if there's something better.


<main>
<network_device updated="1/14/2017 10:02:45" checked="1/30/2017 18:55:30" hash="1cdf045c">
    <hostname>CNMASAS02</hostname>
    <management_ip>10.1.1.1</management_ip>
    <serials>
        <serial type="ABCD1234" hardware="somehardware" serial="XYZ1234567890"/>
        <boot></boot>
    </serials>
    <cdp_neighbors>
        <neighbor added="1/14/2017 10:02:45" ip="10.2.2.2" hostname="somedevice" platform="cisco_ios"/>
        <neighbor added="1/14/2017 10:02:45" ip="10.2.2.2" hostname="somedevice" platform="cisco_ios"/>
    </cdp_neighbors>
    <interfaces>
    </interfaces>
</network_device>

<network_device updated="1/14/2017 10:02:45" checked="1/30/2017 18:55:30" hash="1frgd432">
    <hostname>CNMASAS03</hostname>
    <management_ip>10.1.1.2</management_ip>
    <error_code>#8: Could not access IP address to poll host.</error_code>
</network_device>
</main>

XML is a text format which means that it is stuck in a sequential order with no space for modification. Therefore, any update has to involve reading in the file, making the modifications and the writing out the entire file. The only way to improve on this is to separate out the records using xinclude or document entities. You may still have to read in the whole document but you can only modify the part containing the altered nodes. More coding but that's often the price of efficiency. I'm working on a binary, n-dimensional xml format which would be more efficient for things like this but require way more coding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM