简体   繁体   English

读取巨大的xml元素值后清理内存

[英]Cleaning up memory after reading a giant xml element value

I rarely turn here for help, but this is driving me crazy: I'm reading an xml file that wraps an arbitrary number of items, each with a b64-encoded file (and some accompanying metadata for it). 我很少在这里寻求帮助,但这使我发疯:我正在读取一个xml文件,该文件包装了任意数量的项目,每个项目都有b64编码的文件(以及一些随附的元数据)。 Originally I just read the whole file into an XmlDocument , but while that was much cleaner code, I realized there's no limit on the size of the file, and XmlDocument eats a lot of memory and can run out if the file is large enough. 最初,我只是将整个文件读入XmlDocument ,但是虽然这是更简洁的代码,但我意识到文件的大小没有限制,并且XmlDocument占用大量内存,如果文件足够大,可能会用完。 So I rewrote the code to instead use XmlTextReader , which works great if the issue is that the program was sent an xml file with a large number of reasonably-sized attachments... but there's still a big problem, and that's where I turn to you: 因此,我重写了代码,改用XmlTextReader ,如果问题是程序发送了一个带有大量合理大小的附件的xml文件,那么该方法XmlTextReader好用了……但这仍然是一个大问题,这就是我转向的地方您:

If my xml reader is at a File element, that element contains a value that's enormous (say, 500MB), and I call reader.ReadElementContentAsString() , I now have a string that occupies 500MB (or possibly an OutOfMemoryException). 如果我的xml阅读器位于File元素上,则该元素包含一个巨大的值(例如500MB),我将其称为reader.ReadElementContentAsString() ,我现在有了一个占用500MB的字符串(或可能是OutOfMemoryException)。 What I would like to do in either case is just write to a log, "that file attachment was totally way too big, we're going to ignore it and move on", then move onto the next file. 在这两种情况下,我想做的只是写一个日志,“该文件附件太大了,我们将忽略它并继续前进”,然后移至下一个文件。 But it doesn't appear that the string I just tried to read is ever garbage collected, so what actually happens is the string takes up all the RAM, and every other file it tries to read after that also throws an OutOfMemoryException, even though most of the files will be quite small. 但是似乎我刚刚尝试读取的字符串似乎从未被垃圾回收,所以实际上发生的是该字符串占用了所有RAM,并且在此之后尝试读取的所有其他文件也都抛出OutOfMemoryException,即使大多数文件将非常小。

Recall: at this point, I'm reading the element's value into a local string, so I would have expected it would be eligible for garbage collection immediately (and that it would thus be garbage collected, at the latest, when the program attempts to read the next item and discovers it has no memory available). 回想一下:在这一点上,我正在将元素的值读取到本地字符串中,因此我希望它可以立即进行垃圾收集(因此,最晚在程序尝试进​​行垃圾收集时,可以对其进行垃圾收集)。阅读下一项,发现没有可用的内存)。 But I've tried everything, just in case: setting the string to null, calling explicit GC.Collect() ... no dice, Task Manager indicates the GC only collected about 40k, of the ~500MB it just requested to store the string in, and I still get out of memory exceptions attempting to read anything else. 但是我已经尝试了所有方法,以防万一:将字符串设置为null,调用显式GC.Collect() ...没有骰子,任务管理器指示GC仅收集了大约40k的内存,而它只是请求存储约500MB字符串输入,但我仍然无法尝试读取其他内容而出现内存不足异常。

There doesn't seem to be any way to know the length of the value contained in an xml element using XmlTextReader without reading that element, so I imagine I'm stuck reading the string... am I missing something, or is there really no way to read a giant value from an xml file without totally destroying your program's ability to do anything further afterwards? 在没有读取该元素的情况下,似乎没有任何方法可以使用XmlTextReader来了解xml元素中包含的值的长度,所以我想我一直在读取字符串...我是否缺少某些内容,或者确实存在没有办法从xml文件中读取巨大的价值而又不完全破坏程序在事后做任何事情的能力? I'm going insane with this. 我要疯了。

I have read a bit about C#'s GC, and the LOH, but nothing I read would have indicated to me that this would happen... 我已经阅读了一些有关C#的GC和LOH的信息,但是我所读到的一切都没有告诉我这会发生...

Let me know if you need any further information, and thanks! 让我知道您是否需要任何进一步的信息,谢谢!

edit: I did realize that the process was running as a 32-bit process, which meant it was being starved for memory a bit more than it should've been. 编辑:我确实意识到该进程是作为32位进程运行的,这意味着它比预期的要饿得多。 Fixed that, this becomes less of an issue , but it is still behavior I'd like to fix. 修复了该问题 ,这已不再是一个问题 ,但仍然是我想解决的问题。 (It takes more and/or larger files to reach the point where an OutOfMemoryException is thrown, but once it is thrown, I still can't seem to reclaim that memory in a timely fashion.) (需要更多和/或更大的文件才能达到抛出OutOfMemoryException的程度,但是一旦抛出OutOfMemoryException,我似乎仍然无法及时回收该内存。)

I had a similiar Issue with a soap Service used to transfer large files as base64 string. 我在肥皂服务中遇到过类似的问题,该服务用于将大文件作为base64字符串传输。

I used XDocument instead of XmlDocument back then, that did the trick for me. 那时我使用XDocument而不是XmlDocument,这对我有用。

You may use XmlReader.ReadValueChunk method to read the contents of an element one "chunk" at a time instead of trying to read the whole content at once. 您可以使用XmlReader.ReadValueChunk方法一次读取一个“块”元素的内容,而不是尝试一次读取整个内容。 This way you may for example decide at some point that the data is too large and then ignore it and log the event. 这样,您可以例如在某个时候确定数据太大,然后将其忽略并记录事件。 StringBuilder is probably the best way to combine the collected char array chunks in one string. StringBuilder可能是将收集的char数组块组合到一个字符串中的最佳方法。

If you want to release memory with GC.Collect() , you can force immediate finalizations and memory release with GC.WaitForPendingFinalizers() . 如果要使用GC.Collect()释放内存,则可以使用GC.WaitForPendingFinalizers()强制立即完成并释放内存。 This may affect performance (or even hang, see description behind the link), but you should get rid of the large objects assuming you don't have any live references to them anymore (ie the local variables are already out of scope or their value is set to null) and continue operations normally. 这可能会影响性能(甚至挂起,请参见链接后面的描述),但是您应该摆脱大型对象,前提是您不再有对其的实时引用(即,局部变量已超出范围或它们的值)设置为null),然后继续正常运行。 You should of course use this as a last resort, when memory consumption is an issue and you really want to force getting rid of the excess memory allocations. 当内存消耗成为问题并且您确实想强制摆脱多余的内存分配时,您当然应该使用此作为最后的手段。

I have successfully used GC.Collect();GC.WaitForPendingFinalizers(); 我已经成功使用了GC.Collect();GC.WaitForPendingFinalizers(); combination in a memory-sensitive environment to keep the memory footprint of an application well under 100MB, even when it reads through some really large XML files (>100MB). 在对内存敏感的环境中结合使用,即使应用程序读取某些非常大的XML文件(> 100MB),也可以将应用程序的内存占用保持在100MB以下。 To improve performance I also used Process.PrivateMemorySize64 to track memory consumption and force finalizations only after certain limit was reached. 为了提高性能,我还使用Process.PrivateMemorySize64跟踪内存消耗并仅在达到一定限制后才强制完成。 Before my improvements memory consumption did sometimes rise over 1GB! 在进行改进之前,内存消耗有时确实超过了1GB!

I am not positive this is the case but I think you need to dispose of the XmlTextReader . 我不是这种情况,但是我认为您需要处理XmlTextReader Save the xmlpath of the node after the excessively large node to a string, set your massive string to null, then dispose of the XmlTextReader and reopen it at the node after the large node. 将过大节点之后的节点的xmlpath保存为字符串,将大字符串设置为null,然后处置XmlTextReader并在大节点之后的节点处重新打开它。 From what I understand if you set your string to null , or it goes out of scope, the GC should free that memory asap. 据我了解,如果将字符串设置为null ,或者超出范围,GC应该尽快释放该内存。 It seems more likely to me that you're freeing the string but you continue doing operations with the XmlTextReader which is now holding onto a ton of memory. 对我来说,您似乎更可能释放字符串,但是继续使用XmlTextReader进行操作,该操作现在可以保留大量内存。

Another idea that came to mind was to try doing this within an unsafe block and then freeing the memory explicitly, however, it doesn't look like that's possible (someone else might know but after looking around a bit it seems the unsafe block is still GC'd, it just gives you pointers). 想到的另一个想法是尝试在unsafe块中执行此操作,然后显式释放内存,但是,这似乎是不可能的(其他人可能会知道,但是环顾四周后,似乎仍然不安全的块仍然存在GC'd,它只是为您提供了指针)。 Yet another option, although imo a terrible one, would be to make a dll for parsing in C or C++ and call it from your C# project. 尽管imo是一个糟糕的选择,但还有另一种选择,那就是制作一个dll进行C或C ++解析,然后从您的C#项目中调用它。

Try the first suggestion before doing anything crazy like the last one :) 在做最后一个类似的疯狂事情之前,请尝试第一个建议:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM