I have a very large XML file containing data about network devices. Every time the program iterates, it will modify one network_device entry while leaving the rest of the file alone. I am trying to find the most efficient way to:
Every example I've seen so far loads the entire XML file into memory in an ElementTree object, edits the tree, and the writes the tree to a file. At up to a few hundred megabytes per file, this is a very intensive process.
I am using the lxml library to do this, but I am not stuck to that idea if there's something better.
<main>
<network_device updated="1/14/2017 10:02:45" checked="1/30/2017 18:55:30" hash="1cdf045c">
<hostname>CNMASAS02</hostname>
<management_ip>10.1.1.1</management_ip>
<serials>
<serial type="ABCD1234" hardware="somehardware" serial="XYZ1234567890"/>
<boot></boot>
</serials>
<cdp_neighbors>
<neighbor added="1/14/2017 10:02:45" ip="10.2.2.2" hostname="somedevice" platform="cisco_ios"/>
<neighbor added="1/14/2017 10:02:45" ip="10.2.2.2" hostname="somedevice" platform="cisco_ios"/>
</cdp_neighbors>
<interfaces>
</interfaces>
</network_device>
<network_device updated="1/14/2017 10:02:45" checked="1/30/2017 18:55:30" hash="1frgd432">
<hostname>CNMASAS03</hostname>
<management_ip>10.1.1.2</management_ip>
<error_code>#8: Could not access IP address to poll host.</error_code>
</network_device>
</main>
XML is a text format which means that it is stuck in a sequential order with no space for modification. Therefore, any update has to involve reading in the file, making the modifications and the writing out the entire file. The only way to improve on this is to separate out the records using xinclude or document entities. You may still have to read in the whole document but you can only modify the part containing the altered nodes. More coding but that's often the price of efficiency. I'm working on a binary, n-dimensional xml format which would be more efficient for things like this but require way more coding.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.