简体   繁体   中英

How to combine large XML files using MSXML SAX in Delphi

Edit: My (incomplete and very rough) XmlLite header translation is available on GitHub

What is the best way to do a simple combine of massive XML documents in Delphi with MSXML without using DOM? Should I use the COM components SAXReader and XMLWriter and are there any good examples?

The transformation is a simple combination of all the Contents elements from the root (Container) from many big files (60MB+) to one huge file (~1GB).

<Container>
    <Contents />
    <Contents />
    <Contents />
</Container>

I have it working in the following C# code using an XmlWriter and XmlReaders, but it needs to happen in a native Delphi process:

var files = new string[] { @"c:\bigFile1.xml", @"c:\bigFile2.xml", @"c:\bigFile3.xml", @"c:\bigFile4.xml", @"c:\bigFile5.xml", @"c:\bigFile6.xml" };

using (var writer = XmlWriter.Create(@"c:\HugeOutput.xml", new XmlWriterSettings{ Indent = true }))
{
    writer.WriteStartElement("Container");

    foreach (var inputFile in files)
        using (var reader = XmlReader.Create(inputFile))
        {
            reader.MoveToContent();
            while (reader.Read())
                if (reader.IsStartElement("Contents"))
                    writer.WriteNode(reader, true);
        }

    writer.WriteEndElement(); //End the Container element
}

We already use MSXML DOM in other parts of the system and I do not want to add new components if possible.

XmlLite is a native C++ port of xml reader and writer from System.Xml, which provides the pull parsing programming model. It is in-the-box with W2K3 SP2, WinXP SP3 and above. You'll need a Delphi header translation before almost 1-1 mapping from C# to Delphi.

I'd just use regular file I/O to writeln a to a text file, writeln each of the contents as a string, and finally writeln . If you had a more reasonable size, I'd assemble everything in a stringlist and then stream that to disk. But if you're into GB territory, that would be risky.

libxml with the Delphi wrapper Libxml2 might be an option (found here ), it has some SAX support and seems to be very solid - the web page mentions that libxml2 passed all 1800+ tests from the OASIS XML Tests Suite. See also: Is there a SAX Parser for Delphi and Free Pascal?

Posting this as answer because it needs some space and formatting.

I've got one baaad data file for tests see the message at https://github.com/the-Arioch/omnixml/commit/d1a544048e86921983fced67c772944f12cb1427

Here OmniXML kind of sucks in XE2 debug build:

  • About 25% more memory use than TXmlDocument/MSXML. Maybe even more after fixing .NextSibling issue, did not re-test.
  • longer file loading time ( OTOH significantly faster reading node properties: they are already Delphi-typed variables, no crossing of MSXML/Delphi boundary )
  • absolutely no support for namespaces, which makes recognizing tags way harder
  • XPath in embryo state, including yet again lack of namespaces

https://docs.google.com/spreadsheets/d/1QcFVwh3fFfaDyRmv2b-n4Rq4_u5p42UfNbR_FZgZizY/edit?usp=sharing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM