解压到 ByteArrayOutputStream —— 为什么我会收到 EOFException？

Question

I have been trying to create a Java program that will read zip files from an online API, unzip them into memory (not into the file system), and load them into a database.我一直在尝试创建一个 Java 程序，该程序将从在线 API 读取 zip 文件，将它们解压缩到内存中（而不是文件系统中），然后将它们加载到数据库中。 Since the unzipped files need to be loaded into the database in a specific order, I will have to unzip all of the files before I load any of them.由于解压缩的文件需要以特定顺序加载到数据库中，因此在加载任何文件之前，我必须解压缩所有文件。

I basically used another question on StackOverflow as a model on how to do this.我基本上使用 StackOverflow 上的另一个问题作为如何做到这一点的模型。 Using ZipInputStream from util.zip I was able to do this with a smaller ZIP (0.7MB zipped ~4MB unzipped), but when I encountered a larger file (25MB zipped, 135MB unzipped), the two largest files were not read into memory.使用util.zip ZipInputStream我能够使用较小的 ZIP（压缩 0.7MB ~ 4MB 解压缩）来做到这一点，但是当我遇到一个较大的文件（25MB 压缩，135MB 解压缩）时，两个最大的文件没有读入内存。 I was not even able to retrieve a ZipEntry for these larger files (8MB and 120MB, the latter making up the vast majority of the data in the zip file).我什至无法为这些较大的文件（8MB 和 120MB，后者构成 zip 文件中的绝大多数数据）检索ZipEntry 。 No exceptions were thrown, and my program proceeded until it tried to access tha the unzipped files that failed to be written, and threw NullPointerException.没有抛出异常，我的程序继续进行，直到它尝试访问无法写入的解压缩文件，并抛出 NullPointerException。

I am using Jsoup to get the zipfile from online.我正在使用 Jsoup 从网上获取 zipfile。

Has anyone had any experience with this and can give guidance on why I am unable to retrieve the complete contents of the zip file?有没有人有这方面的经验，可以就为什么我无法检索 zip 文件的完整内容提供指导？

Below is the code that I am using.下面是我正在使用的代码。 I am collecting unzipped files as InputStream s in a HashMap, and when there are no more ZipEntry s, the program should stop looking for ZipEntry s when there are no more left.我正在 HashMap 中将解压缩的文件作为InputStream收集，并且当没有更多ZipEntry ，程序应该在没有更多剩余时停止寻找ZipEntry 。

    private Map<String, InputStream> unzip(ZipInputStream verZip) throws IOException {

        Map<String, InputStream> result = new HashMap<>();

        while (true) {
            ZipEntry entry;
            byte[] b = new byte[1024];
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int l;

            entry = verZip.getNextEntry();//Might throw IOException

            if (entry == null) {
                break;
            }

            try {
                while ((l = verZip.read(b)) > 0) {
                    out.write(b, 0, l);
                }
                out.flush();
            }catch(EOFException e){
                e.printStackTrace();
            }
            catch (IOException i) {
                System.out.println("there was an ioexception");
                i.printStackTrace();
                fail();
            }
            result.put(entry.getName(), new ByteArrayInputStream(out.toByteArray()));
        }
        return result;
    }

Might I be better off if my program took advantage of the filesystem to unzip files?如果我的程序利用文件系统来解压缩文件，我可能会更好吗？

Answer 1

It turns out that Jsoup is the root of the issue.事实证明，Jsoup 是问题的根源。 When obtaining binary data with a Jsoup connection, there is a limit to how many bytes will be read from the connection.使用 Jsoup 连接获取二进制数据时，从连接中读取的字节数是有限制的。 By default, this limit is 1048576, or 1 megabyte.默认情况下，此限制为 1048576，即 1 兆字节。 As a result, when I feed the binary data from Jsoup into a ZipInputStream , the resulting data is cut off after one megabyte.结果，当我将 Jsoup 中的二进制数据输入ZipInputStream ，生成的数据在 1 兆字节后被截断。 This limit, maxBodySizeBytes can be found in org.jsoup.helper.HttpConnection.Request .这个限制， maxBodySizeBytes可以在org.jsoup.helper.HttpConnection.Request找到。

        Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
        //^^returns a Connection that will only retrieve 1MB of data
        InputStream oneMb = c.execute().bodyStream();
        ZipInputStream oneMbZip = new ZipInputStream(oneMb);

Trying to unzip the truncated oneMbZip is what led me to get the EOFException试图解压缩截断的oneMbZip是导致我得到EOFException

With the code below, I was able to change Connection 's byte limit to 1 GB (1073741824), and then was able to retrieve the zip file without running into an EOFException .使用下面的代码，我能够将Connection的字节限制更改为 1 GB (1073741824)，然后能够检索 zip 文件而不会遇到EOFException 。

        Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
        //^^returns a Connection that will only retrieve 1MB of data
        Connection.Request theRequest = c.request();
        theRequest.maxBodySize(1073741824);
        c.request(theRequest);//Now this connection will retrieve as much as 1GB of data
        InputStream oneGb = c.execute().bodyStream();
        ZipInputStream oneGbZip = new ZipInputStream(oneGb);

Note that maxBodySizeBytes is an int and its upper limit is 2,147,483,647, or just under 2GB.请注意， maxBodySizeBytes是一个整数，其上限为 2,147,483,647，或略低于 2GB。

解压到 ByteArrayOutputStream —— 为什么我会收到 EOFException？

问题描述

1 个解决方案

解决方案1
1 2019-12-13 19:23:39

解压到 ByteArrayOutputStream —— 为什么我会收到 EOFException？

问题描述

1 个解决方案

解决方案1 1 2019-12-13 19:23:39

解决方案1
1 2019-12-13 19:23:39