Run length encoding of arbitrary length substrings

Question

What's efficient (in terms of both time complexity) run length encoding algorithm for an arbitrary but finite length of input stream. An algorithm for substrings of length 1 can be implemented in C as:

void encoding(char *bytes)
{
    int c = 0; 
    char *s = bytes, ch;

    while(*s) {
       c=1;
       ch=*s;

       while(*s && *s== *(s+1)) {
          c++;
          s++;
       }
       printf("%d%c", c, ch);
       s++;
    }
}

However, I am looking for a better algorithm that can encode substrings of any length. For example, for the input "abbabb" the above code will print: "1a2b1a2b" . But a better algorithm could encode it as "2abb" .

The implementation language (C/Python is my choice) is not an issue as an algorithm is all I am looking for.

Answer 1

Any algorithm that can find a certain length repeated substring can be used to implement Lempel-Ziv compression with a sliding window of that length.

So I would look into Lempel-Ziv encoders and use that. Or even better: drop the run length encoding and implement Lempel-Ziv - it can only provide better compression.

Run length encoding of arbitrary length substrings

Question

1 answers

solution1
3 2015-05-16 12:24:18

Run length encoding of arbitrary length substrings

Question

1 answers

solution1 3 2015-05-16 12:24:18

solution1
3 2015-05-16 12:24:18