![](/img/trans.png)
[英]I would like to split a huge file into many number of files with the header in all split files. Using python
[英]I have a very large xml file (almost 1 gb) I need to split the xml file into 3 smaller files. All with the same headers. I would like to do in Python
我正在使用下面的代码打开文件,但它不会打开,因为它太大了。
from xml.dom import minidom
Test_file = open("C:\\Users\\samue\\OneDrive\\Desktop\\mopar.xml","r", encoding="utf8")
xmldox = minidom.parse(Test_file)
Test_file.close()
def printNode(node):
print (node)
for child in node.childNodes:
printNode(child)
printNode(xmldoc.documentElement)
尽管我没有看到您粘贴的调用堆栈之类的错误消息,但我想您的代码可能在第二行或第三行失败。
您是否尝试过通过 xml.etree.cElementTree 解析您的 xml 文件?
例如,使用下面的代码,您可以知道 ET 解析您的 XML 文件的时间。
import os
import time
import xml.etree.cElementTree as ET
def read_xml_file(xml_file, element):
"""
Parse the xml file to xml.etree.cElementTree
"""
tree = ET.parse(xml_file)
root = tree.getroot()
number_of_element = len(root.findall(element))
return '{:,.0f}'.format(number_of_element)
start_time = time.perf_counter()
counter = read_xml_file(xml_file_name, 'ProteinEntry/header') # the element here depends on your XML header tag
end_time = time.perf_counter()
total_time = round(end_time - start_time, 2)
print(f'xml.etree.cElementTree - Total time taken:[{total_time}] seconds to identify the number of elements: [{counter}]')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.