简体   繁体   中英

Compression of strings and symbols

I have string of roughly 200 characters including characters and symbols I would like to compress this string using any algorithms...

Please help me any kind of programs , codes , algortihms

Thanks in advance

currently i am using this , but when symbols are there it shows array index out of bounds.

**COMPRESSION**
byte[] encode(String txt, int bit){
int length = txt.length();
float tmpRet1=0,tmpRet2=0;
if(bit==6){
    tmpRet1=3.0f;
    tmpRet2=4.0f;
}else if(bit==5){
    tmpRet1=5.0f;
    tmpRet2=8.0f;
}
byte encoded[]=new byte[(int)(tmpRet1*Math.ceil(length/tmpRet2))];
char str[]=new char[length];
txt.getChars(0,length,str,0);
int chaVal = 0;
String temp;
String strBinary = new String("");
for (int i = 0;i<length; i++){
    temp = Integer.toBinaryString(toValue(str[i]));
    while(temp.length()%bit != 0){
        temp="0"+temp;
    }
    strBinary=strBinary+temp;
}
while(strBinary.length()%8 != 0){
   strBinary=strBinary+"0";
}
Integer tempInt =new Integer(0);
for(int i=0 ; i<strBinary.length();i=i+8){
    tempInt = tempInt.valueOf(strBinary.substring(i,i+8),2);
    encoded[i/8]=tempInt.byteValue();
}
return encoded;
}



**DECOMPRESSION** :

String decode(byte[] encoded, int bit){
String strTemp = new String("");
String strBinary = new String("");
String strText = new String("");
Integer tempInt =new Integer(0);
int intTemp=0;
for(int i = 0;i<encoded.length;i++){         
    if(encoded[i]<0){
        intTemp = (int)encoded[i]+256;
    }else
        intTemp = (int)encoded[i];
    strTemp = Integer.toBinaryString(intTemp);
    while(strTemp.length()%8 != 0){
        strTemp="0"+strTemp;
    }
    strBinary = strBinary+strTemp;
}
for(int i=0 ; i<strBinary.length();i=i+bit){
    tempInt = tempInt.valueOf(strBinary.substring(i,i+bit),2);
    strText = strText + toChar(tempInt.intValue()); 
}
return strText;
}

Once, while I was studing, my teacher made me code a text compressor (cool homeworks). The basic idea was: if each character is 8 bits, find the characters that appear most and assign them a shorter value, while assigning a larger value to the letters that appear less.

Example:

A = 01010101 B = 10101010

Uncompressed: AAAB - 01010101 01010101 01010101 10101010

Compressed:

A appears 3 times (should have shorter representation) B appears 1 time (should have longer representation)

A - 01

B - 10

Result: 01 01 01 10

So, you generate a serie of bits for each letter in a way that no letter should have a representation that could be matched against another letter. Then you store that generated scheme in the compressed file. If you want to de-compress just read the scheme from the compressed file and then start reading bit-a-bit.

Look here for details: http://web.stonehill.edu/compsci//LC/TEXTCOMPRESSION.htm

You could use a GZIPOutputStream for compression a GZIPInputStream for decompression.

If you want to do it in memory, just use a ByteArrayInputStream/ByteArrayOutputStream as a target for the two classes above.

See the link bellow:

http://docs.oracle.com/javase/1.5.0/docs/api/java/util/zip/GZIPOutputStream.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM