简体   繁体   English

C# MemoryStream & GZipInputStream: Can't.Read more than 256 bytes

[英]C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes

I'm having a problem with writing an uncompressed GZIP stream using SharpZipLib's GZipInputStream.我在使用 SharpZipLib 的 GZipInputStream 编写未压缩的 GZIP stream 时遇到问题。 I only seem to be able to get 256 bytes worth of data with the rest not being written to and left zeroed.在 rest 未被写入并保持为零的情况下,我似乎只能获得 256 字节的数据。 The compressed stream (compressedSection) has been checked and all data is there (1500+ bytes).已检查压缩的 stream(compressedSection),所有数据都在那里(1500+ 字节)。 The snippet of the decompression process is below:解压过程片段如下:

int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
    msi.Write(compressedSection, 0, compressedSection.Length);
    msi.Position = 0;
    int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer

    // SharpZipLib GZip method called
    using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
    {
        using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
        {
            byte[] buffer = new byte[uncompressedIntSize];
            decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read         
            outputStream.Write(buffer, 0, uncompressedIntSize);
            using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
            {
                fs.Write(buffer, 0, buffer.Length);
                fs.Close();
            }
            outputStream.Close();
        }
        decompressStream.Close();

So in this snippet:所以在这个片段中:

1) The compressed section is passed in, ready to be decompressed. 1)压缩段传入,准备解压。

2) The expected size of the uncompressed output (which is stored in a header with the file as a 2-byte little-endian value) is passed through a method to convert it to integer. 2) 未压缩的 output(存储在 header 中,文件为 2 字节 little-endian 值)的预期大小通过方法将其转换为 Z157DB7DF53002936755E8ZD36。 The header is removed earlier as it is not part of the compressed GZIP file. header 已被较早删除,因为它不是压缩 GZIP 文件的一部分。

3) SharpLibZip's GZIP stream is declared with the compressed file stream (msi) and a buffer equal to int uncompressedIntSize (have tested with a static value of 4096 as well). 3) SharpLibZip 的 GZIP stream 是用压缩文件 stream (msi) 和一个等于 int uncompressedIntSize 的缓冲区声明的(也用 ZA81259CEF845E959C627EDF1D4 的值进行了测试)。

4) I set up a MemoryStream to handle writing the output to a file as GZipInputStream doesn't have Read/Write; 4)我设置了一个 MemoryStream 来处理将 output 写入文件,因为 GZipInputStream 没有读/写功能; it takes the expected decompressed file size as the argument (capacity).它将预期的解压缩文件大小作为参数(容量)。

5) The Read/Write of the stream needs byte[] array as the first argument, so I set up a byte[] array with enough space to take all the bytes of the decompressed output (3584 bytes in this case, derived from uncompressedIntSize). 5) stream 的读/写需要 byte[] 数组作为第一个参数,所以我设置了一个 byte[] 数组,有足够的空间来获取解压后的 output 的所有字节(本例中为 3584 字节,源自 uncompressedIntSize )。

6) int GzipInputStream decompressStream uses.Read with the buffer as first argument, from offset 0, using the uncompressedIntSize as the count. 6) int GzipInputStream decompressStream uses.Read 将缓冲区作为第一个参数,从偏移量 0 开始,使用 uncompressedIntSize 作为计数。 Checking the arguments in here, the buffer array still has a capacity of 3584 bytes but has only been given 256 bytes of data.在这里查看 arguments ,缓冲区数组仍然有 3584 字节的容量,但只给出了 256 字节的数据。 The rest are zeroes. rest 为零。

It looks like the output of.Read is being throttled to 256 bytes but I'm not sure where.看起来 output of.Read 被限制为 256 字节,但我不确定在哪里。 Is there something I've missed with the Streams, or is this a limitation with.Read? Streams 是否有我遗漏的东西,或者这是.Read 的限制?

You need to loop when reading from a stream;从 stream 读取时需要循环 the lazy way is probably:懒惰的方式可能是:

decompressStream.CopyTo(outputStream);

(but this doesn't guarantee to stop after uncompressedIntSize bytes - it'll try to read to the end of decompressStream ) (但这并不能保证在uncompressedIntSize字节后停止 - 它会尝试读取到decompressStream的末尾)

A more manual version (that respects an imposed length limit) would be:更手动的版本(尊重强加的长度限制)将是:

const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
    int remaining = uncompressedIntSize, bytesRead;
    while (remaining > 0 && // more to do, and making progress
        (bytesRead = decompressStream.Read(
        buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
    {
        outputStream.Write(buffer, 0, bytesRead);
        remaining -= bytesRead;
    }
    if (remaining != 0) throw new EndOfStreamException();
}
finally
{
    ArrayPool<byte>.Shared.Return(buffer);
}

The issue turned out to be an oversight I'd made earlier in the posted code:这个问题原来是我之前在发布的代码中所做的疏忽:

The file I'm working with has 27 sections which are GZipped, but they each have a header which will break the Gzip decompression if the GZipInput stream hits any of them.我正在使用的文件有 27 个 GZipped 部分,但它们每个都有一个 header 如果 GZipInput stream 命中其中任何一个部分,它将破坏 Gzip 解压缩。 When opening the base file, it was starting from the beginning (adjusted by 6 to avoid the first header) each time instead of going to the next post-head offset:打开基本文件时,每次都是从头开始(调整 6 以避免第一个标题),而不是转到下一个 post-head 偏移量:

brg.BaseStream.Seek(6, SeekOrigin.Begin); brg.BaseStream.Seek(6, SeekOrigin.Begin);

Instead of:代替:

brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin); brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);

This meant that the extracted compressed data was an amalgam of the first headerless section + part of the 2nd section along with its header.这意味着提取的压缩数据是第一个无标题部分 + 第二部分的一部分及其 header 的混合物。 As the first section is 256 bytes long without its header, this part was being decompressed correctly by the GZipInput stream.由于第一部分是 256 字节长,没有 header,这部分被 GZipInput stream 正确解压缩。 But after that is 6-bytes of header which breaks it, resulting in the rest of the output being 00s.但在那之后是 6 字节的 header 破坏了它,导致 output 的 rest 为 00s。

There was no explicit error being thrown by the GZipInput stream when this happened, so I'd incorrectly assumed that the cause was the.Read or something in the stream retaining data from the previous pass.发生这种情况时,GZipInput stream 没有引发明确的错误,所以我错误地认为原因是 stream 中的.Read 或其他内容保留了上一次传递的数据。 Sorry for the hassle.很抱歉给您带来麻烦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM