简体   繁体   English

从C#中的巨大MemoryStream中读取

[英]Read from a huge MemoryStream in C#

I use a BinaryReader ( MemoryStream(MyByteArray) ) to read variable sized records and process them all in memory. 我使用BinaryReader( MemoryStream(MyByteArray) )来读取可变大小的记录并在内存中处理它们。 This works well as long as my bytestream, which is in the array, is less than about 1.7 GB in size. 只要数组中的字节流大小小于约1.7 GB,这就可以正常工作。 After that (which is the maximum size of an integer in my 64-bit system) you cannot create a larger bytearray, although I have enough real memory. 之后(这是我的64位系统中整数的最大大小)你不能创建一个更大的bytearray,虽然我有足够的实内存。 So my solution has been to read the bytestream and split it into several byte arrays. 所以我的解决方案是读取字节流并将其拆分成几个字节数组。

Now however, I cannot "read" across the byte array boundaries, and, as my data is in a variable format, I cannot ensure that byte arrays always finish on a whole record. 然而,现在我不能“读取”字节数组边界,并且,由于我的数据是可变格式,我无法确保字节数组总是在整个记录上完成。

This must be a common problem for people processing very large datasets and still have the need for speed. 对于处理非常大的数据集并且仍然需要速度的人来说,这必然是一个常见问题。

How do I handle this problem? 我该如何处理这个问题?

Edit : Reading up on the basics, I realize that memory-mapped files might be slower than normal I/O for sequential access. 编辑 :阅读基础知识,我意识到内存映射文件可能比正常I / O慢,以便顺序访问。

Have you tried something like this: 你尝试过这样的事情:

var stream = new FileStream("data", 
    FileMode.Open, 
    FileAccess.Read, 
    FileShare.Read, 
    16 * 1024, 
    FileOptions.SequentialScan)

var reader = new BinaryReader(stream);

If your data resides in a file and you can use .NET 4.0 consider using MemoryMappedFile . 如果您的数据驻留在文件中并且您可以使用.NET 4.0,请考虑使用MemoryMappedFile

You can then either use a MemoryMappedViewStream to get a stream or use a MemoryMappedViewAccessor to get a BinaryReader -like interface. 然后,您可以使用MemoryMappedViewStream来获取流,也可以使用MemoryMappedViewAccessor来获取类似BinaryReader的接口。

For excessively large streams, you shouldn't try dumping it in MemoryStream - use things like FileStream instead, and talk directly to disk. 对于过大的流,不应尝试将其转储到MemoryStream - 改为使用FileStream东西,直接与磁盘通信。 The inbuilt buffering is usually sufficient, or you can tweak this with things like BufferedStream (but I have rarely needed to - but then, I tend to include my own data-processing buffer). 内置缓冲通常就足够了,或者你可以用BufferedStream东西调整它(但我很少需要 - 但是,我倾向于包含我自己的数据处理缓冲区)。

You might also consider things like compression or densely packed data, and serializers designed to work by streaming records rather than creating an entire graph at once (although since you mention BinaryReader , you may already be doing this highly manually, so this might not be an issue). 你也可以考虑像压缩事物或密集的数据,并设计通过记录,而不是一次创建一个完整的图形(虽然因为你提到的工作串行BinaryReader ,你可能已经被高度手动这样做,所以这可能不是一个问题)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM