简体   繁体   English

如何在创建之前估计java中的zip文件大小

[英]How to estimate zip file size in java before creating it

I am having a requirement wherein i have to create a zip file from a list of available files. 我有一个要求,我必须从可用文件列表中创建一个zip文件。 The files are of different types like txt,pdf,xml etc.I am using java util classes to do it. 这些文件有不同的类型,如txt,pdf,xml等。我正在使用java util类来完成它。

The requirement here is to maintain a maximum file size of 5 mb. 这里的要求是保持最大文件大小为5 MB。 I should select the files from list based on timestamp, add the files to zip until the zip file size reaches 5 mb. 我应该根据时间戳从列表中选择文件,将文件添加到zip,直到zip文件大小达到5 mb。 I should skip the remaining files. 我应该跳过剩下的文件。

Please let me know if there is a way in java where in i can estimate the zip file size in advance without creating actual file? 请告诉我,如果在java中有一种方法,我可以提前估计zip文件大小而不创建实际文件?

Or is there any other approach to handle this 或者有没有其他方法来处理这个问题

Wrap your ZipOutputStream into a personalized OutputStream, named here YourOutputStream. 将ZipOutputStream包装到个性化的OutputStream中,命名为YourOutputStream。

  • The constructor of YourOutputStream will create another ZipOutputStream (zos2) which wraps a new ByteArrayOutputStream (baos) YourOutputStream的构造函数将创建另一个ZipOutputStream (zos2),它包装一个新的ByteArrayOutputStream (baos)
    public YourOutputStream(ZipOutputStream zos, int maxSizeInBytes)
  • When you want to write a file with YourOutputStream , it will first write it on zos2 当您想使用YourOutputStream编写文件时,它将首先在zos2上编写它
    public void writeFile(File file) throws ZipFileFullException
    public void writeFile(String path) throws ZipFileFullException
    etc... 等等...
  • if baos.size() is under maxSizeInBytes 如果baos.size()maxSizeInBytes
    • Write the file in zos1 在zos1中写入文件
  • else 其他
    • close zos1, baos, zos2 an throw an exception. 关闭zos1,baos,zos2抛出异常。 For the exception, I can't think of an already existant one, if there is, use it, else create your own IOException ZipFileFullException. 对于异常,我想不出已经存在的异常,如果有,请使用它,否则创建自己的IOException ZipFileFullException。

You need two ZipOutputStream, one to be written on your drive, one to check if your contents is over 5MB. 您需要两个ZipOutputStream,一个要写在您的驱动器上,一个用于检查您的内容是否超过5MB。

EDIT : In fact I checked, you can't remove a ZipEntry easily . 编辑:事实上我检查过, 你不能轻易删除ZipEntry

http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#size() http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#size()

+1 for Colin Herbert: Add files one by one, either back up the previous step or removing the last file if the archive is to big. Colin Herbert的+1:逐个添加文件,要么备份上一步,要么删除最后一个文件(如果存档很大)。 I just want to add some details: 我只是想补充一些细节:

Prediction is way too unreliable. 预测太不可靠了。 Eg a PDF can contain uncompressed text, and compress down to 30% of the original, or it contains already-compressed text and images, compressing to 80%. 例如,PDF可以包含未压缩的文本,并压缩到原始文本的30%,或者它包含已经压缩的文本和图像,压缩到80%。 You would need to inspect the entire PDF for compressibility, basically having to compress them. 您需要检查整个PDF的可压缩性,基本上必须压缩它们。

You could try a statistical prediction , but that could reduce the number of failed attempts, but you would still have to implement above recommendation. 您可以尝试统计预测 ,但这可以减少尝试失败的次数,但您仍然需要实施上述建议。 Go with the simpler implementation first, and see if it's enough. 首先使用更简单的实现,看看它是否足够。

Alternatively, compress files individually , then pick the files that won't exceedd 5 MB if bound together. 或者,单独压缩文件 ,然后选择绑定在一起不超过5 MB的文件。 If unpacking is automated, too, you could bind the zip files into a single uncompressed zip file. 如果解压缩也是自动化的,您可以将zip文件绑定到单个未压缩的zip文件中。

Maybe you could add a file each time, until you reach the 5MB limit, and then discard the last file. 也许你每次都可以添加一个文件,直到达到5MB的限制,然后丢弃最后一个文件。 Like @Gopi , I don't think there is any way to estimate it without actually compressing the file. @Gopi一样,我认为没有任何方法可以在不实际压缩文件的情况下估算它。

Of course, file size will not increase (or maybe a little, because of the zip header?), so at least you have a "worst case" estimation. 当然,文件大小不会增加(或者可能是一点,因为zip标题?),所以至少你有一个“最坏情况”估计。

just wanted to share how we implemented manual way 只想分享我们如何实施手动方式

            int maxSizeForAllFiles = 70000; // Read from property
        int sizePerFile = 22000; // Red from property
        /**
         * Iterate all attachment list to verify if ZIP is required
         */
        for (String attachFile : inputAttachmentList) {
            File file = new File(attachFile);
            totalFileSize += file.length();
            /**
             * if ZIP required ??? based on the size
             */
            if (file.length() >= sizePerFile) {
                toBeZipped = true;
                logger.info("File: "
                            + attachFile
                                + " Size: "
                                + file.length()
                                + " File required to be zipped, MAX allowed per file: "
                                + sizePerFile);
                break;
            }
        }
        /**
         * Check if all attachments put together cross MAX_SIZE_FOR_ALL_FILES
         */
        if (totalFileSize >= maxSizeForAllFiles) {
            toBeZipped = true;
        }
        if (toBeZipped) {
            // Zip Here iterating all attachments
        }

I dont think there is any way to estimate the size of zip that will be created because the zips are processed as streams. 我不认为有任何方法可以估计将创建的zip的大小,因为拉链被处理为流。 Also it would not be technically possible to predict the size of the created compressed format unless you actually compress it. 除非您实际压缩它,否则在技术上不可能预测创建的压缩格式的大小。

I did this once on a project with known input types. 我在具有已知输入类型的项目上执行了一次。 We knew that general speaking our data compressed around 5:1 (it was all text.) So, I'd check the file size and divide by 5... 我们知道一般来说我们的数据压缩在5:1左右(这是所有文本。)所以,我检查文件大小并除以5 ...

In this case, the purpose for doing so was to check that files would likely be below a certain size. 在这种情况下,这样做的目的是检查文件是否可能低于特定大小。 We only needed a rough estimate. 我们只需粗略估计。

All that said, I have noticed zip applications like 7zip will create a zip file of a certain size (like a CD) and then split the zip off to a new file once it reaches the limit. 总而言之,我注意到像7zip这样的zip应用程序会创建一个特定大小的zip文件(如CD),然后一旦达到限制就将zip拆分为新文件。 You could look at that source code. 您可以查看该源代码。 I have actually used the command line version of that app in code before. 我之前在代码中实际使用过该应用程序的命令行版本。 They have a library you can use as well. 他们有一个你也可以使用的图书馆。 Not sure how well that will integrate with Java though. 不知道与Java集成的程度如何。

For what it is worth, I've also used a library called SharpZipLib. 为了它的价值,我还使用了一个名为SharpZipLib的库。 It was very good. 非常好。 I wonder if there is a Java port to it. 我想知道是否有Java端口。

There is a better option. 有一个更好的选择。 Create a dummy LengthOutputStream that just counts the written bytes: 创建一个只计算写入字节的虚拟LengthOutputStream

public class LengthOutputStream extends OutputStream {

    private long length = 0L;

    @Override
    public void write(int b) throws IOException {
        length++;
    }

    public long getLength() {
        return length;
    }
}

You can just simply connect the LengthOutputStream to a ZipOutputStream : 您只需将LengthOutputStream连接到ZipOutputStream

public static long sizeOfZippedDirectory(File dir) throws FileNotFoundException, IOException {
        try (LengthOutputStream sos = new LengthOutputStream();
            ZipOutputStream zos = new ZipOutputStream(sos);) {
            ... // Add ZIP entries to the stream
            return sos.getLength();
        }
    }

The LengthOutputStream object counts the bytes of the zipped stream but stores nothing, so there is no file size limit. LengthOutputStream对象计算压缩流的字节数但不存储任何内容,因此没有文件大小限制。 This method gives an accurate size estimation but almost as slow as creating a ZIP file. 此方法提供准确的大小估计,但几乎与创建ZIP文件一样慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM