简体   繁体   中英

Run length encoding of arbitrary length substrings

What's efficient (in terms of both time complexity) run length encoding algorithm for an arbitrary but finite length of input stream. An algorithm for substrings of length 1 can be implemented in C as:

void encoding(char *bytes)
{
    int c = 0; 
    char *s = bytes, ch;

    while(*s) {
       c=1;
       ch=*s;

       while(*s && *s== *(s+1)) {
          c++;
          s++;
       }
       printf("%d%c", c, ch);
       s++;
    }
}

However, I am looking for a better algorithm that can encode substrings of any length. For example, for the input "abbabb" the above code will print: "1a2b1a2b" . But a better algorithm could encode it as "2abb" .

The implementation language (C/Python is my choice) is not an issue as an algorithm is all I am looking for.

Any algorithm that can find a certain length repeated substring can be used to implement Lempel-Ziv compression with a sliding window of that length.

So I would look into Lempel-Ziv encoders and use that. Or even better: drop the run length encoding and implement Lempel-Ziv - it can only provide better compression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM