[英]Compress and Decompress String in Java
I'm trying to compress and decompress a string from producer and consumer environment (which accepts only string as params). 我正在尝试从生产者和使用者环境(仅接受字符串作为参数)压缩和解压缩字符串。
So after I compress a string, I'm converting compressed byte array to string and then passing it to producer. 因此,在压缩字符串之后,我将压缩的字节数组转换为字符串,然后将其传递给生产者。 Then in consumer part, I'm taking the string back , converting into byte array and then decompressing the string from bytes. 然后在使用者部分,我将字符串取回,转换为字节数组,然后从字节解压缩字符串。
Instead of converting into string, if I used byte[], then it is working fine. 如果我使用byte []而不是转换为字符串,则可以正常工作。 But what I need is to convert into string and viceversa. 但是我需要转换为字符串,反之亦然。
Here is my code : 这是我的代码:
public class Compression {
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
String strToCompress = "Helloo!! ";
byte[] compressedBytes = compress(strToCompress);
String compressedStr = new String(compressedBytes, StandardCharsets.UTF_8);
byte[] bytesToDecompress = compressedStr.getBytes(StandardCharsets.UTF_8);
String decompressedStr = decompress(bytesToDecompress);
System.out.println("Compressed Bytes : "+Arrays.toString(compressedBytes));
System.out.println("Decompressed String : "+decompressedStr);
}
public static byte[] compress(final String str) throws IOException {
if ((str == null) || (str.length() == 0)) {
return null;
}
ByteArrayOutputStream obj = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(obj);
gzip.write(str.getBytes("UTF-8"));
gzip.flush();
gzip.close();
return obj.toByteArray();
}
public static String decompress(final byte[] compressed) throws IOException {
final StringBuilder outStr = new StringBuilder();
if ((compressed == null) || (compressed.length == 0)) {
return "";
}
if (isCompressed(compressed)) { //It is not going into this if part
final GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed));
final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
String line;
while ((line = bufferedReader.readLine()) != null) {
outStr.append(line);
}
} else {
outStr.append(compressed);
}
return outStr.toString();
}
public static boolean isCompressed(final byte[] compressed) {
return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8));
}
}
You can't assume a compressed string can be treated as UTF-8 as many possible byte combinations are not valid UTF-8. 您不能假定压缩字符串可以视为UTF-8,因为许多可能的字节组合都不是有效的UTF-8。 I suggest trying ISO-8859-1 which keeps all 8-bit values untranslated. 我建议尝试使用ISO-8859-1,该方法将所有8位值保持未翻译状态。
Also note that while large pieces of text should get smaller, small strings can get larger. 还要注意,虽然大段文本应该变小,但小字符串可以变大。
Note: This loop will strip any newline characters 注意:此循环将删除所有换行符
String line;
while ((line = bufferedReader.readLine()) != null) {
outStr.append(line);
}
I suggest instead copying using a char[]
which won't drop any characters. 我建议改为使用不删除任何字符的char[]
复制。
char[] chars = new char[512];
for(int len; (len = reader.read(chars)) > 0;)
outStr.append(chars, 0, len);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.