简体   繁体   中英

How to estimate zip file size in java before creating it

I am having a requirement wherein i have to create a zip file from a list of available files. The files are of different types like txt,pdf,xml etc.I am using java util classes to do it.

The requirement here is to maintain a maximum file size of 5 mb. I should select the files from list based on timestamp, add the files to zip until the zip file size reaches 5 mb. I should skip the remaining files.

Please let me know if there is a way in java where in i can estimate the zip file size in advance without creating actual file?

Or is there any other approach to handle this

Wrap your ZipOutputStream into a personalized OutputStream, named here YourOutputStream.

  • The constructor of YourOutputStream will create another ZipOutputStream (zos2) which wraps a new ByteArrayOutputStream (baos)
    public YourOutputStream(ZipOutputStream zos, int maxSizeInBytes)
  • When you want to write a file with YourOutputStream , it will first write it on zos2
    public void writeFile(File file) throws ZipFileFullException
    public void writeFile(String path) throws ZipFileFullException
    etc...
  • if baos.size() is under maxSizeInBytes
    • Write the file in zos1
  • else
    • close zos1, baos, zos2 an throw an exception. For the exception, I can't think of an already existant one, if there is, use it, else create your own IOException ZipFileFullException.

You need two ZipOutputStream, one to be written on your drive, one to check if your contents is over 5MB.

EDIT : In fact I checked, you can't remove a ZipEntry easily .

http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#size()

+1 for Colin Herbert: Add files one by one, either back up the previous step or removing the last file if the archive is to big. I just want to add some details:

Prediction is way too unreliable. Eg a PDF can contain uncompressed text, and compress down to 30% of the original, or it contains already-compressed text and images, compressing to 80%. You would need to inspect the entire PDF for compressibility, basically having to compress them.

You could try a statistical prediction , but that could reduce the number of failed attempts, but you would still have to implement above recommendation. Go with the simpler implementation first, and see if it's enough.

Alternatively, compress files individually , then pick the files that won't exceedd 5 MB if bound together. If unpacking is automated, too, you could bind the zip files into a single uncompressed zip file.

Maybe you could add a file each time, until you reach the 5MB limit, and then discard the last file. Like @Gopi , I don't think there is any way to estimate it without actually compressing the file.

Of course, file size will not increase (or maybe a little, because of the zip header?), so at least you have a "worst case" estimation.

just wanted to share how we implemented manual way

            int maxSizeForAllFiles = 70000; // Read from property
        int sizePerFile = 22000; // Red from property
        /**
         * Iterate all attachment list to verify if ZIP is required
         */
        for (String attachFile : inputAttachmentList) {
            File file = new File(attachFile);
            totalFileSize += file.length();
            /**
             * if ZIP required ??? based on the size
             */
            if (file.length() >= sizePerFile) {
                toBeZipped = true;
                logger.info("File: "
                            + attachFile
                                + " Size: "
                                + file.length()
                                + " File required to be zipped, MAX allowed per file: "
                                + sizePerFile);
                break;
            }
        }
        /**
         * Check if all attachments put together cross MAX_SIZE_FOR_ALL_FILES
         */
        if (totalFileSize >= maxSizeForAllFiles) {
            toBeZipped = true;
        }
        if (toBeZipped) {
            // Zip Here iterating all attachments
        }

I dont think there is any way to estimate the size of zip that will be created because the zips are processed as streams. Also it would not be technically possible to predict the size of the created compressed format unless you actually compress it.

I did this once on a project with known input types. We knew that general speaking our data compressed around 5:1 (it was all text.) So, I'd check the file size and divide by 5...

In this case, the purpose for doing so was to check that files would likely be below a certain size. We only needed a rough estimate.

All that said, I have noticed zip applications like 7zip will create a zip file of a certain size (like a CD) and then split the zip off to a new file once it reaches the limit. You could look at that source code. I have actually used the command line version of that app in code before. They have a library you can use as well. Not sure how well that will integrate with Java though.

For what it is worth, I've also used a library called SharpZipLib. It was very good. I wonder if there is a Java port to it.

There is a better option. Create a dummy LengthOutputStream that just counts the written bytes:

public class LengthOutputStream extends OutputStream {

    private long length = 0L;

    @Override
    public void write(int b) throws IOException {
        length++;
    }

    public long getLength() {
        return length;
    }
}

You can just simply connect the LengthOutputStream to a ZipOutputStream :

public static long sizeOfZippedDirectory(File dir) throws FileNotFoundException, IOException {
        try (LengthOutputStream sos = new LengthOutputStream();
            ZipOutputStream zos = new ZipOutputStream(sos);) {
            ... // Add ZIP entries to the stream
            return sos.getLength();
        }
    }

The LengthOutputStream object counts the bytes of the zipped stream but stores nothing, so there is no file size limit. This method gives an accurate size estimation but almost as slow as creating a ZIP file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM