简体   繁体   中英

Java zip character encoding

I'm using the following method to compress a file into a zip file:

import java.util.zip.CRC32;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public static void doZip(final File inputfis, final File outputfis) throws IOException {

    FileInputStream fis = null;
    FileOutputStream fos = null;

    final CRC32 crc = new CRC32();
    crc.reset();

    try {
        fis = new FileInputStream(inputfis);
        fos = new FileOutputStream(outputfis);
        final ZipOutputStream zos = new ZipOutputStream(fos);
        zos.setLevel(6);
        final ZipEntry ze = new ZipEntry(inputfis.getName());
        zos.putNextEntry(ze);
        final int BUFSIZ = 8192;
        final byte inbuf[] = new byte[BUFSIZ];
        int n;
        while ((n = fis.read(inbuf)) != -1) {
            zos.write(inbuf, 0, n);
            crc.update(inbuf);
        }
        ze.setCrc(crc.getValue());
        zos.finish();
        zos.close();
    } catch (final IOException e) {
        throw e;
    } finally {
        if (fis != null) {
            fis.close();
        }
        if (fos != null) {
            fos.close();
        }
    }
}

My problem is that i have flat text files with the content N°TICKET for example, the zipped result gives some weired characters when uncompressed N° TICKET . Also characters such as é and à are not supported.

I guess it's due to the character encoding, but I don't know how to set it in my zip method to ISO-8859-1 ?

(I'm running on windows 7, java 6)

Afaik this is not available in Java 6.

But I do believe that http://commons.apache.org/compress/ can provide a solution.

Switching to Java 7 provides a new constructor that that encoding as an additional parameter.

https://blogs.oracle.com/xuemingshen/entry/non_utf_8_encoding_in

zipStream = new ZipInputStream(
    new BufferedInputStream(new FileInputStream(archiveFile), BUFFER_SIZE),
    Charset.forName("ISO-8859-1")

You are using streams which write exactly the bytes that they are given. Writers interpret character data and convert it to the corresponding bytes and Readers do the opposite. Java (at least in version 6) doesn't provide an easy way to to mix and match operations on zipped data and for writing characters.

This way will work though. It is, however, a little clunky.

File inputFile = new File("utf-8-data.txt");
File outputFile = new File("latin-1-data.zip");

ZipEntry entry = new ZipEntry("latin-1-data.txt");

BufferedReader reader = new BufferedReader(new FileReader(inputFile));

ZipOutputStream zipStream = new ZipOutputStream(new FileOutputStream(outputFile));
BufferedWriter writer = new BufferedWriter(
    new OutputStreamWriter(zipStream, Charset.forName("ISO-8859-1"))
);

zipStream.putNextEntry(entry);

// this is the important part:
// all character data is written via the writer and not the zip output stream
String line = null;
while ((line = reader.readLine()) != null) {
    writer.append(line).append('\n');
}
writer.flush(); // i've used a buffered writer, so make sure to flush to the
// underlying zip output stream

zipStream.closeEntry();
zipStream.finish();

reader.close(); 
writer.close();

try to use org.apache.commons.compress.archivers.zip.ZipFile; not java's own library so you can give encoding like that:

import org.apache.commons.compress.archivers.zip.ZipFile;

ZipFile zipFile = new ZipFile(filepath,encoding);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM