简体   繁体   English

用于大数据量的 MemoryStream 的替代方案

[英]alternative to MemoryStream for large data volumes

I'm having problems with out of memory exceptions when using a .Net MemoryStream if the data is large and the process is 32 bit.如果数据很大并且进程是 32 位,我在使用 .Net MemoryStream 时会遇到内存不足异常的问题。

I believe that the System.IO.Packaging API silently switches from memory to to file-backed storage as the data volume increases, and on the face of it, it seems it would be possible to implement a subclass of MemoryStream that does exactly the same thing.我相信 System.IO.Packaging API 随着数据量的增加会悄悄地从内存切换到文件支持的存储,从表面上看,似乎有可能实现一个完全相同的 MemoryStream 子类东西。

Does anyone know of such an implementation?有谁知道这样的实现? I'm pretty sure there is nothing in the framework itself.我很确定框架本身没有任何内容。

Programmers try too hard to avoid using a file.程序员竭力避免使用文件。 The difference between memory and a file is a very small one in Windows.内存和文件之间的差异在 Windows 中非常小。 Any memory you use for a MemoryStream in fact requires a file.您用于 MemoryStream 的任何内存实际上都需要一个文件。 The storage is backed by the paging file, c:\\pagefile.sys.存储由分页文件 c:\\pagefile.sys 提供支持。 And the reverse is true as well, any file you use is backed by memory.反过来也是如此,您使用的任何文件都由内存支持。 File data is cached in RAM by the file system cache.文件数据由文件系统缓存缓存在 RAM 中。 So if the machine has sufficient RAM then you will in fact only read and write from/to memory if you use a FileStream.因此,如果机器有足够的 RAM,那么您实际上只会在使用 FileStream 时从/向内存读取和写入。 And get the perf you expect from using memory.并通过使用内存获得您期望的性能。 It is entirely free, you don't have to write any code to enable this nor do you have to manage it.它是完全免费的,您不必编写任何代码来启用它,也不必管理它。

If the machine doesn't have enough RAM then it deteriorates the same way.如果机器没有足够的内存,那么它会以同样的方式恶化。 When you use a MemoryStream then the paging file starts trashing and you'll be slowed down by the disk.当您使用 MemoryStream 时,分页文件开始垃圾化,磁盘会减慢速度。 When you use a file then the data won't fit the file system cache and you'll be slowed down by the disk.当您使用文件时,数据将不适合文件系统缓存,并且磁盘会减慢速度。

You'll of course get the benefit of using a file, you won't run out of memory anymore.你当然会得到使用文件的好处,你不会再耗尽内存了。 Use a FileStream instead.改用 FileStream。

This is expected to happen using MemoryStream so you should implement you own logic or use some external class.预计使用MemoryStream会发生这种情况,因此您应该实现自己的逻辑或使用某些外部类。 here is a post that explains the problems with MemoryStream and big data and the post gives an alternative to MemoryStream A replacement for MemoryStream这是一篇解释MemoryStream和大数据问题的帖子,该帖子给出了MemoryStream 的替代方案替代 MemoryStream

We've run into similar obstacles on my team.我们在我的团队中遇到了类似的障碍。 Some commenters have suggested that developers need to be more okay with using files.一些评论者建议开发人员需要更好地使用文件。 If it's an option to use the filesystem directly do that, but that's not always an option.如果可以选择直接使用文件系统,请这样做,但这并不总是一种选择。

If, like we needed, you want to pass data read from a file around your application, you can't pass the FileStream object because it can get disposed before you're done reading the data.如果像我们需要的那样,您希望从应用程序周围的文件中读取数据,则不能传递 FileStream 对象,因为它可能会在您完成读取数据之前被处理掉。 We originally resorted to MemoryStreams to let us pass the data around easily, but ran into the same problem.我们最初使用 MemoryStreams 来让我们轻松地传递数据,但遇到了同样的问题。

We've used a couple different workarounds to mitigate the problem.我们使用了几种不同的解决方法来缓解这个问题。

Options we've used include:我们使用的选项包括:

  • Implement a wrapper class to store the data in multiple (since arrays are still limited to int.MaxValue number of entries) byte[] objects and expose methods that enable you to almost treat them like a Stream.实现一个包装类以将数据存储在多个(因为数组仍然限于int.MaxValue条目数)byte[] 对象中,并公开使您几乎可以像对待流一样对待它们的方法。 We still try to avoid this at all costs.我们仍然会不惜一切代价避免这种情况。
  • Use some sort of "token" to pass a reference to the location of the data and wait to load the data "just in time" in the application.使用某种“令牌”传递对数据位置的引用,并等待在应用程序中“及时”加载数据。

I'd suggest checking out this project.我建议检查一下这个项目。

http://www.codeproject.com/Articles/348590/A-replacement-for-MemoryStream http://www.codeproject.com/Articles/348590/A-replacement-for-MemoryStream

I believe the problem with memory streams comes from the fact that underneath it all they are still a fancy wrapper for a single byte[] and so are still constrained by .net's requirement that all objects must be less than 2gb even in 64bit programs.我相信内存流的问题来自这样一个事实,即在它下面它们仍然是单个字节 [] 的花哨包装器,因此仍然受到 .net 要求的约束,即即使在 64 位程序中,所有对象也必须小于 2gb。 The above implementation breaks the byte[] into several different byte[]s.上面的实现将 byte[] 分解为几个不同的 byte[]。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM