What's efficient (in terms of both time complexity) run length encoding algorithm for an arbitrary but finite length of input stream. An algorithm for substrings of length 1 can be implemented in C as:
void encoding(char *bytes)
{
int c = 0;
char *s = bytes, ch;
while(*s) {
c=1;
ch=*s;
while(*s && *s== *(s+1)) {
c++;
s++;
}
printf("%d%c", c, ch);
s++;
}
}
However, I am looking for a better algorithm that can encode substrings of any length. For example, for the input "abbabb"
the above code will print: "1a2b1a2b"
. But a better algorithm could encode it as "2abb"
.
The implementation language (C/Python is my choice) is not an issue as an algorithm is all I am looking for.
Any algorithm that can find a certain length repeated substring can be used to implement Lempel-Ziv compression with a sliding window of that length.
So I would look into Lempel-Ziv encoders and use that. Or even better: drop the run length encoding and implement Lempel-Ziv - it can only provide better compression.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.