简体   繁体   中英

zip4j setFileNameCharset not working

I'm using zip4j to unzip files and i have a problem with file name charset . This is my code,

 try {
        ZipFile zipFile = new ZipFile(source);
        if (zipFile.isEncrypted()) {
            zipFile.setPassword(password);
        }
        System.out.println(System.getProperty("file.encoding"));
        zipFile.setFileNameCharset("UTF-8");
        zipFile.extractAll(destination);
    } catch (ZipException e) {
        System.out.println(e.getMessage());
    }
}

It is work fine but the files names like this 在此处输入图片说明

When you compress and extract zip file using zip4j, you use same charset.

(my test case: compress UTF-8, extract UTF-8, it's OK / zip4j 1.3.2)

If you want to extract zip file created by other software. The charset of file name might be system default charset (such as GBK, Shift-JIS or others charsets...)

In this case, if one of source file name contain a unicode char which does not exist in that charset. The file name in that ZipEntry is convered to UTF-8.

To extract this kind of zip file, the file name must be converted by custom code one by one.

ZipFile zipFile = new ZipFile("input.zip");
UnzipParameters param = new UnzipParameters();
zipFile.setFileNameCharset("ISO8859-1");
List list = zipFile.getFileHeaders();
for (Iterator iterator = list.iterator(); iterator.hasNext();) {
    FileHeader fh = (FileHeader) iterator.next();
    byte[] b = fh.getFileName().getBytes("ISO8859-1");
    String fname = null;
    try {
        fname = new String(b, "UTF-8");
        if (fname.getBytes("UTF-8").length != b.length) {
            fname = new String(b,"GBK");//most possible charset 
        }
    } catch (Throwable e) {
        //try other charset or ...
        System.err.println("Invalid file name: "+fname);
    }
    z.extractFile(fh, dir, param, fname);
}

Beck Yang's solution really work.

For someone who unzip Chinese named file with password encrypted. There are a bug in zip4j. You must call setFileNameCharset before isEncrypted and setPassword. Otherwise the output encoding is wrong.

And also, writing directory only (empty directory) have encoding problem. And cannot be fixed besides modifying the source code.

Building upon @Beck Yang just use apache tika library and auto detect the charset and you are good to go for any language.

import org.apache.tika.parser.txt.CharsetDetector;

...

ZipFile zipFile = new ZipFile("input.zip");
UnzipParameters param = new UnzipParameters();
zipFile.setFileNameCharset("ISO8859-1");
List list = zipFile.getFileHeaders();

for (Iterator iterator = list.iterator(); iterator.hasNext();) {
    FileHeader fh = (FileHeader) iterator.next();
    byte[] b = fh.getFileName().getBytes("ISO8859-1");
    String fname = null;
    try {
        CharsetDetector charDetect = new CharsetDetector();
        charDetect.setText(b);
        String charSet = charDetect.detect().getName();
        fName = new String(b, charSet);
    } catch (Throwable e) {        
        fName = fh.getFileName();
    }
    z.extractFile(fh, dir, param, fname);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM