简体   繁体   English

从ZipInputStream读取到ByteArrayOutputStream

[英]Reading from a ZipInputStream into a ByteArrayOutputStream

I am trying to read a single file from a java.util.zip.ZipInputStream , and copy it into a java.io.ByteArrayOutputStream (so that I can then create a java.io.ByteArrayInputStream and hand that to a 3rd party library that will end up closing the stream, and I don't want my ZipInputStream getting closed). 我试图从java.util.zip.ZipInputStream读取一个文件,并将其复制到java.io.ByteArrayOutputStream (这样我就可以创建一个java.io.ByteArrayInputStream并将其交给第三方库,将最终关闭流,我不希望我的ZipInputStream关闭)。

I'm probably missing something basic here, but I never enter the while loop here: 我可能在这里遗漏了一些基本内容,但我从未在这里输入while循环:

ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
int bytesRead;
byte[] tempBuffer = new byte[8192*2];
try {
    while ((bytesRead = zipStream.read(tempBuffer)) != -1) {
        streamBuilder.write(tempBuffer, 0, bytesRead);
    }
} catch (IOException e) {
    // ...
}

What am I missing that will allow me to copy the stream? 我错过了哪些可以让我复制流?

Edit: 编辑:

I should have mentioned earlier that this ZipInputStream is not coming from a file, so I don't think I can use a ZipFile . 我之前应该提到过,这个ZipInputStream不是来自文件,所以我认为我不能使用ZipFile It is coming from a file uploaded through a servlet. 它来自通过servlet上传的文件。

Also, I have already called getNextEntry() on the ZipInputStream before getting to this snippet of code. 此外,在获取此代码片段之前,我已经在ZipInputStream上调用了getNextEntry() If I don't try copying the file into another InputStream (via the OutputStream mentioned above), and just pass the ZipInputStream to my 3rd party library, the library closes the stream, and I can't do anything more, like dealing with the remaining files in the stream. 如果我不尝试将文件复制到另一个InputStream (通过上面提到的OutputStream ),并且只是将ZipInputStream传递给我的第三方库,那么库会关闭流,而我无法做更多的事情,比如处理流中的剩余文件。

You probably tried reading from a FileInputStream like this: 您可能尝试从FileInputStream读取如下:

ZipInputStream in = new ZipInputStream(new FileInputStream(...));

This won't work since a zip archive can contain multiple files and you need to specify which file to read. 行不通的,因为一个zip压缩包可以包含多个文件,你需要指定读取的文件。

You could use java.util.zip.ZipFile and a library such as IOUtils from Apache Commons IO or ByteStreams from Guava that assist you in copying the stream. 您可以使用java.util.zip.ZipFile和一个库,例如来自Apache Commons IO的IOUtils来自Guava的 ByteStreams,它们可以帮助您复制流。

Example: 例:

ByteArrayOutputStream out = new ByteArrayOutputStream();
try (ZipFile zipFile = new ZipFile("foo.zip")) {
    ZipEntry zipEntry = zipFile.getEntry("fileInTheZip.txt");

    try (InputStream in = zipFile.getInputStream(zipEntry)) {
        IOUtils.copy(in, out);
    }
}

Your loop looks valid - what does the following code (just on it's own) return? 你的循环看起来有效 - 下面的代码(只是它自己的)会返回什么?

zipStream.read(tempBuffer)

if it's returning -1, then the zipStream is closed before you get it, and all bets are off. 如果它返回-1,那么zipStream会在你获得之前关闭,所有的赌注都会关闭。 It's time to use your debugger and make sure what's being passed to you is actually valid. 是时候使用你的调试器并确保传递给你的是真正有效的。

When you call getNextEntry(), does it return a value, and is the data in the entry meaningful (ie does getCompressedSize() return a valid value)? 当你调用getNextEntry()时,它是否返回一个值,并且条目中的数据是否有意义(即getCompressedSize()是否返回有效值)? IF you are just reading a Zip file that doesn't have read-ahead zip entries embedded, then ZipInputStream isn't going to work for you. 如果您只是阅读没有嵌入预读zip条目的Zip文件,那么ZipInputStream将不适合您。

Some useful tidbits about the Zip format: 关于Zip格式的一些有用的花絮:

Each file embedded in a zip file has a header. 嵌入在zip文件中的每个文件都有一个标题。 This header can contain useful information (such as the compressed length of the stream, it's offset in the file, CRC) - or it can contain some magic values that basically say 'The information isn't in the stream header, you have to check the Zip post-amble'. 此标头可以包含有用的信息(例如流的压缩长度,它在文件中的偏移量,CRC) - 或者它可以包含一些基本上说'信息不在流标题中的魔术值,你必须检查Zip post-amble'。

Each zip file then has a table that is attached to the end of the file that contains all of the zip entries, along with the real data. 然后每个zip文件都有一个附加到文件末尾的表,其中包含所有zip条目以及实际数据。 The table at the end is mandatory, and the values in it must be correct. 最后的表是必需的,其中的值必须正确。 In contrast, the values embedded in the stream do not have to be provided. 相反,不必提供嵌入在流中的值。

If you use ZipFile, it reads the table at the end of the zip. 如果您使用ZipFile,它会读取zip末尾的表格。 If you use ZipInputStream, I suspect that getNextEntry() attempts to use the entries embedded in the stream. 如果您使用ZipInputStream,我怀疑getNextEntry()尝试使用流中嵌入的条目。 If those values aren't specified, then ZipInputStream has no idea how long the stream might be. 如果未指定这些值,则ZipInputStream不知道流可能有多长。 The inflate algorithm is self terminating (you actually don't need to know the uncompressed length of the output stream in order to fully recover the output), but it's possible that the Java version of this reader doesn't handle this situation very well. 膨胀算法是自终止的(实际上你不需要知道输出流的未压缩长度以便完全恢复输出),但是这个阅读器的Java版本可能不能很好地处理这种情况。

I will say that it's fairly unusual to have a servlet returning a ZipInputStream (it's much more common to receive an inflatorInputStream if you are going to be receiving compressed content. 我会说让servlet返回ZipInputStream是相当不寻常的(如果你要接收压缩内容,接收inflatorInputStream会更常见。

我将使用来自公共项目的IOUtils

IOUtils.copy(zipStream, byteArrayOutputStream);

You're missing call 你错过了电话

ZipEntry entry = (ZipEntry) zipStream.getNextEntry(); ZipEntry entry =(ZipEntry)zipStream.getNextEntry();

to position the first byte decompressed of the first entry. 定位第一个条目解压缩的第一个字节。

 ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
 int bytesRead;
 byte[] tempBuffer = new byte[8192*2];
 ZipEntry entry = (ZipEntry) zipStream.getNextEntry();
 try {
     while ( (bytesRead = zipStream.read(tempBuffer)) != -1 ){
        streamBuilder.write(tempBuffer, 0, bytesRead);
     }
 } catch (IOException e) {
      ...
 }

You could implement your own wrapper around the ZipInputStream that ignores close() and hand that off to the third-party library. 您可以围绕ZipInputStream实现自己的包装器,它忽略close()并将其交给第三方库。

thirdPartyLib.handleZipData(new CloseIgnoringInputStream(zipStream));


class CloseIgnoringInputStream extends InputStream
{
    private ZipInputStream stream;

    public CloseIgnoringInputStream(ZipInputStream inStream)
    {
        stream = inStream;
    }

    public int read() throws IOException {
        return stream.read();
    }

    public void close()
    {
        //ignore
    }

    public void reallyClose() throws IOException
    {
        stream.close();
    }
}

I would call getNextEntry() on the ZipInputStream until it is at the entry you want (use ZipEntry.getName() etc.). 我会在ZipInputStream上调用getNextEntry(),直到它在你想要的条目(使用ZipEntry.getName()等)。 Calling getNextEntry() will advance the "cursor" to the beginning of the entry that it returns. 调用getNextEntry()会将“游标”前进到它返回的条目的开头。 Then, use ZipEntry.getSize() to determine how many bytes you should read using zipInputStream.read(). 然后,使用ZipEntry.getSize()确定使用zipInputStream.read()读取的字节数。

Please try code bellow 请尝试下面的代码

private static byte[] getZipArchiveContent(File zipName) throws WorkflowServiceBusinessException {

  BufferedInputStream buffer = null;
  FileInputStream fileStream = null;
  ByteArrayOutputStream byteOut = null;
  byte data[] = new byte[BUFFER];

  try {
   try {
    fileStream = new FileInputStream(zipName);
    buffer = new BufferedInputStream(fileStream);
    byteOut = new ByteArrayOutputStream();

    int count;
    while((count = buffer.read(data, 0, BUFFER)) != -1) {
     byteOut.write(data, 0, count);
    }
   } catch(Exception e) {
    throw new WorkflowServiceBusinessException(e.getMessage(), e);
   } finally {
    if(null != fileStream) {
     fileStream.close();
    }
    if(null != buffer) {
     buffer.close();
    }
    if(null != byteOut) {
     byteOut.close();
    }
   }
  } catch(Exception e) {
   throw new WorkflowServiceBusinessException(e.getMessage(), e);
  }
  return byteOut.toByteArray();

 }

It is unclear how you got the zipStream. 目前还不清楚你是如何得到zipStream的。 It should work when you get it like this: 当你得到它时,它应该工作:

  zipStream = zipFile.getInputStream(zipEntry)

t is unclear how you got the zipStream. 我不清楚你是如何得到zipStream的。 It should work when you get it like this: 当你得到它时,它应该工作:

  zipStream = zipFile.getInputStream(zipEntry)

If you are obtaining the ZipInputStream from a ZipFile you can get one stream for the 3d party library, let it use it, and you obtain another input stream using the code before. 如果从ZipFile获取ZipInputStream,您可以获得一个3d方库的流,让它使用它,然后使用之前的代码获得另一个输入流。

Remember, an inputstream is a cursor. 请记住,输入流是一个游标。 If you have the entire data (like a ZipFile) you can ask for N cursors over it. 如果你有整个数据(比如ZipFile),你可以要求N个游标。

A diferent case is if you only have an "GZip" inputstream, only an zipped byte stream. 不同的情况是,如果您只有一个“GZip”输入流,只有一个压缩字节流。 In that case you ByteArrayOutputStream buffer makes all sense. 在这种情况下,你ByteArrayOutputStream缓冲区是有道理的。

Check if the input stream is positioned in the begging. 检查输入流是否位于乞讨中。

Otherwise, as implementation: I do not think that you need to write to the result stream while you are reading, unless you process this exact stream in another thread. 否则,作为实现:我不认为您在阅读时需要写入结果流,除非您在另一个线程中处理这个确切的流。

Just create a byte array, read the input stream, then create the output stream. 只需创建一个字节数组,读取输入流,然后创建输出流。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM