简体   繁体   中英

zlib avail_in is garbage after deflate call

I am in the process of tracking down a bug that involves zlib and could use some help understanding a peculiar behavior.

The relevant code boils down to the following, and I can reproduce the issue with it in isolation:

void unzip(unsigned char* input, unsigned remainingInput, unsigned char* destination, unsigned destSize)
{
  z_stream* zStream = new z_stream;
  zStream->zalloc = nullptr;
  zStream->zfree = nullptr;
  zStream->opaque = nullptr;
  // reset input and output
  zStream->next_in = nullptr;
  zStream->avail_in = 0UL;
  zStream->next_out = nullptr;
  zStream->avail_out = 0UL;
  int zlibResult = inflateInit(zStream);
  // [error handling omitted]

  constexpr unsigned maxBufferSize = 256;
  unsigned char buffer[maxBufferSize];

  zStream->next_out = destination;
  zStream->avail_out = destSize;

  // process input until output buffer is full
  do
  {
    // get next chunk of input, if needed and more input is available
    if (input != nullptr && zStream->avail_in == 0UL)
    {
      unsigned nextSize = std::min(remainingInput, maxBufferSize);
      std::copy(input, input + nextSize, buffer);
      input += nextSize;
      remainingInput -= nextSize;
      zStream->next_in = buffer;
      zStream->avail_in = nextSize;

      // end input as soon as all input has been processed
      if (zStream->avail_in < maxBufferSize)
        input = nullptr;
    }
    zlibResult = inflate(zStream, Z_NO_FLUSH);
  } while (zlibResult == Z_OK && zStream->avail_out > 0UL);
  if (zlibResult == Z_STREAM_END)
  {
    // [call inflateEnd, cleanup, etc.]
  }
  // else
  //   [error handling]
  // [cleanup]
}

The compressed data was deflated earlier in the program. For very specific input data, this inflating occasionally fails. This happens even when multithreading is removed (there are many blobs of data being deflated and inflated to decrease peak memory usage, but the error always happens for the very same blob), so our current theory is some off-by-one garbage memory read.

In particular, we always observe zStream->avail_in becoming nonsensical after the problematic inflate call. Some of those times, it will return Z_DATA_ERROR (with msg "incorrect data check").

The data to deflate has size 13370. The deflated data should have size 50034.

Here is a dump of the zStream states...

  • ...right before the first inflate call: 在此处输入图像描述

  • ...right before the call that behaves strangely: 在此处输入图像描述

  • ...right after the call that behaves strangely: 在此处输入图像描述

I have searched for reports of similar behavior but couldn't turn up anything. The above looks like zlib actually read way further than the 256 bytes it was asked to consume next (causing avail_in to wrap around, note that 4294965331 = 2 32 - 1965).

Where should I look next in debugging this? Is it expected/normal that the zStream state is nonsensical when a data check fails, or is this indicative of something going wrong inside the lib (but why would it ever read more than the 256 bytes it was asked to read)? Is the above code bad or can this be caused by corrupt data in input (or, no matter how unlikely, is there actually a zlib bug here)?

The issue is luckily very reproducible, so I can provide more information/data if that would help (but I don't want to dump 13 KB of zipped data into the question for now).


Edit: I just stepped through the zlib (1.2.11) sources, and witnessed something resembling an off-by-one here:

// inflate.c:1044
    case LEN:
        if (have >= 6 && left >= 258) {
            RESTORE();
            inflate_fast(strm, out);
            LOAD();
            if (state->mode == TYPE)
                state->back = -1;
            break;

This is reached with have == 6 (and left > 6000 ). After this state transition, have == -1U (and the strm->next_in is 257 bytes after buffer , so there was an out of bounds read on buffer ). So at this point either...

  • ...the above code is plain wrong and misuses zlib in some way, or...

  • ...the data is corrupted somehow and zlib runs into UB as a consequence (is this intended/allowed?), or...

  • ...there is a plain bug in zlib.

I can only reproduce this bug in builds of zlib that use the contributed handcrafted assembler code for key functions - including inflate_fast . The default C implementation does not show this issue.

My conclusion for now is that there is a bug in the contributed assembler code, specifically the x86/x64 ones (bug is reproducible with both), but not the original plain C code.

I will inform the respective authors and update this answer when new information becomes available.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM