我有一个非常大的 xml 文件（几乎 1 gb）我需要将 xml 文件拆分为 3 个较小的文件。都具有相同的标题。我想用 Python 做

Question

我正在使用下面的代码打开文件，但它不会打开，因为它太大了。

from xml.dom import minidom
Test_file = open("C:\\Users\\samue\\OneDrive\\Desktop\\mopar.xml","r", encoding="utf8")
xmldox = minidom.parse(Test_file)

Test_file.close()

def printNode(node):
    print (node)
    for child in node.childNodes:
        printNode(child)
        
printNode(xmldoc.documentElement)

Answer 1

尽管我没有看到您粘贴的调用堆栈之类的错误消息，但我想您的代码可能在第二行或第三行失败。

您是否尝试过通过 xml.etree.cElementTree 解析您的 xml 文件？

例如，使用下面的代码，您可以知道 ET 解析您的 XML 文件的时间。

import os
import time
import xml.etree.cElementTree as ET

def read_xml_file(xml_file, element):
    """
    Parse the xml file to xml.etree.cElementTree
    """
    tree = ET.parse(xml_file)
    root = tree.getroot()
    number_of_element = len(root.findall(element))
    return '{:,.0f}'.format(number_of_element)

start_time = time.perf_counter()
counter = read_xml_file(xml_file_name, 'ProteinEntry/header') # the element here depends on your XML header tag 
end_time = time.perf_counter()
total_time = round(end_time - start_time, 2)
print(f'xml.etree.cElementTree - Total time taken:[{total_time}] seconds to identify the number of elements: [{counter}]')

我有一个非常大的 xml 文件（几乎 1 gb）我需要将 xml 文件拆分为 3 个较小的文件。都具有相同的标题。我想用 Python 做

问题描述

1 个解决方案

解决方案1
0 2022-05-12 17:42:46

我有一个非常大的 xml 文件（几乎 1 gb）我需要将 xml 文件拆分为 3 个较小的文件。 都具有相同的标题。 我想用 Python 做

问题描述

1 个解决方案

解决方案1 0 2022-05-12 17:42:46

我有一个非常大的 xml 文件（几乎 1 gb）我需要将 xml 文件拆分为 3 个较小的文件。都具有相同的标题。我想用 Python 做

解决方案1
0 2022-05-12 17:42:46