简体   繁体   中英

C# .Net SaxonApi throwing out of memory exception

Machine configuration is 4CPU 16 GB RAM and trying to process 800MB and 300MB XML files. Some times .NET Saxon API throws out of memory exceptions below stack trace. Looked at the perfstats for previous few hours and server seems to have 10GB free memory. Below code is run in Parallel Tasks using Task.Run() Please advise.

 DocumentBuilder documentBuilder = processor.NewDocumentBuilder();
 documentBuilder.IsLineNumbering = true;
 documentBuilder.WhitespacePolicy = WhitespacePolicy.PreserveAll;
 XdmNode _XdmNode = documentBuilder.Build(xmlDocumentToEvaluate);

System.Exception: Error in ExecuteRules method ---> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at net.sf.saxon.tree.tiny.TinyTree.condense(Statistics )
   at net.sf.saxon.tree.tiny.TinyBuilder.close()
   at net.sf.saxon.event.ProxyReceiver.close()
   at net.sf.saxon.pull.PullPushCopier.copy()
   at net.sf.saxon.event.Sender.sendPullSource(PullSource , Receiver , ParseOptions )
   at net.sf.saxon.event.Sender.send(Source source, Receiver receiver, ParseOptions options)
   at net.sf.saxon.Configuration.buildDocument(Source source, ParseOptions parseOptions)
   at net.sf.saxon.Configuration.buildDocument(Source source)
   at Saxon.Api.DocumentBuilder.Build(XmlReader reader)
   at Saxon.Api.DocumentBuilder.Build(XmlNode source)

With an 800Mb input file I think you could start hitting limits other than the actual amount of heap memory available, for example the maximum size of an array or a string. This could be the effect you are seeing. One way the TinyTree saves space is to use a small number of large objects rather than a large number of small objects, so it could trigger this effect.

The TinyTree.condense() method (which is where it is failing) is called at the end of tree construction and attempts to reclaim unused space in the arrays used for the TinyTree data structure. This is done by allocating smaller arrays up to the actual size used, and copying data across. So temporarily it needs additional memory, and this is where the failure is occurring. Looking at the code, there's actually an opportunity to reduce the amount of temporary memory needed.

If there are a lot of repeated text or attribute values in your data then it could be worth using the "TinyTreeCondensed" option which attempts to common up such values. But this could be counter-productive if there isn't such duplication, because of the space used for indexing during the tree building process.

With data this large, I think it's a good idea to examine alternative strategies. For example: XML databases; streamed processing; splitting the file into multiple files; document projection. It's impossible to advise on this without knowing the big picture about what problem you are trying to solve.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM