简体   繁体   English

解压到 ByteArrayOutputStream —— 为什么我会收到 EOFException?

[英]Unzipping into a ByteArrayOutputStream -- why am I getting an EOFException?

I have been trying to create a Java program that will read zip files from an online API, unzip them into memory (not into the file system), and load them into a database.我一直在尝试创建一个 Java 程序,该程序将从在线 API 读取 zip 文件,将它们解压缩到内存中(而不是文件系统中),然后将它们加载到数据库中。 Since the unzipped files need to be loaded into the database in a specific order, I will have to unzip all of the files before I load any of them.由于解压缩的文件需要以特定顺序加载到数据库中,因此在加载任何文件之前,我必须解压缩所有文件。

I basically used another question on StackOverflow as a model on how to do this.我基本上使用 StackOverflow 上的另一个问题作为如何做到这一点的模型。 Using ZipInputStream from util.zip I was able to do this with a smaller ZIP (0.7MB zipped ~4MB unzipped), but when I encountered a larger file (25MB zipped, 135MB unzipped), the two largest files were not read into memory.使用util.zip ZipInputStream我能够使用较小的 ZIP(压缩 0.7MB ~ 4MB 解压缩)来做到这一点,但是当我遇到一个较大的文件(25MB 压缩,135MB 解压缩)时,两个最大的文件没有读入内存。 I was not even able to retrieve a ZipEntry for these larger files (8MB and 120MB, the latter making up the vast majority of the data in the zip file).我什至无法为这些较大的文件(8MB 和 120MB,后者构成 zip 文件中的绝大多数数据)检索ZipEntry No exceptions were thrown, and my program proceeded until it tried to access tha the unzipped files that failed to be written, and threw NullPointerException.没有抛出异常,我的程序继续进行,直到它尝试访问无法写入的解压缩文件,并抛出 NullPointerException。

I am using Jsoup to get the zipfile from online.我正在使用 Jsoup 从网上获取 zipfile。

Has anyone had any experience with this and can give guidance on why I am unable to retrieve the complete contents of the zip file?有没有人有这方面的经验,可以就为什么我无法检索 zip 文件的完整内容提供指导?

Below is the code that I am using.下面是我正在使用的代码。 I am collecting unzipped files as InputStream s in a HashMap, and when there are no more ZipEntry s, the program should stop looking for ZipEntry s when there are no more left.我正在 HashMap 中将解压缩的文件作为InputStream收集,并且当没有更多ZipEntry ,程序应该在没有更多剩余时停止寻找ZipEntry

    private Map<String, InputStream> unzip(ZipInputStream verZip) throws IOException {

        Map<String, InputStream> result = new HashMap<>();

        while (true) {
            ZipEntry entry;
            byte[] b = new byte[1024];
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int l;

            entry = verZip.getNextEntry();//Might throw IOException

            if (entry == null) {
                break;
            }

            try {
                while ((l = verZip.read(b)) > 0) {
                    out.write(b, 0, l);
                }
                out.flush();
            }catch(EOFException e){
                e.printStackTrace();
            }
            catch (IOException i) {
                System.out.println("there was an ioexception");
                i.printStackTrace();
                fail();
            }
            result.put(entry.getName(), new ByteArrayInputStream(out.toByteArray()));
        }
        return result;
    }

Might I be better off if my program took advantage of the filesystem to unzip files?如果我的程序利用文件系统来解压缩文件,我可能会更好吗?

It turns out that Jsoup is the root of the issue.事实证明,Jsoup 是问题的根源。 When obtaining binary data with a Jsoup connection, there is a limit to how many bytes will be read from the connection.使用 Jsoup 连接获取二进制数据时,从连接中读取的字节数是有限制的。 By default, this limit is 1048576, or 1 megabyte.默认情况下,此限制为 1048576,即 1 兆字节。 As a result, when I feed the binary data from Jsoup into a ZipInputStream , the resulting data is cut off after one megabyte.结果,当我将 Jsoup 中的二进制数据输入ZipInputStream ,生成的数据在 1 兆字节后被截断。 This limit, maxBodySizeBytes can be found in org.jsoup.helper.HttpConnection.Request .这个限制, maxBodySizeBytes可以在org.jsoup.helper.HttpConnection.Request找到。

        Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
        //^^returns a Connection that will only retrieve 1MB of data
        InputStream oneMb = c.execute().bodyStream();
        ZipInputStream oneMbZip = new ZipInputStream(oneMb);

Trying to unzip the truncated oneMbZip is what led me to get the EOFException试图解压缩截断的oneMbZip是导致我得到EOFException

With the code below, I was able to change Connection 's byte limit to 1 GB (1073741824), and then was able to retrieve the zip file without running into an EOFException .使用下面的代码,我能够将Connection的字节限制更改为 1 GB (1073741824),然后能够检索 zip 文件而不会遇到EOFException

        Connection c = Jsoup.connect("example.com/download").ignoreContentType(true);
        //^^returns a Connection that will only retrieve 1MB of data
        Connection.Request theRequest = c.request();
        theRequest.maxBodySize(1073741824);
        c.request(theRequest);//Now this connection will retrieve as much as 1GB of data
        InputStream oneGb = c.execute().bodyStream();
        ZipInputStream oneGbZip = new ZipInputStream(oneGb);

Note that maxBodySizeBytes is an int and its upper limit is 2,147,483,647, or just under 2GB.请注意, maxBodySizeBytes是一个整数,其上限为 2,147,483,647,或略低于 2GB。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM