简体   繁体   English

任意长度子串的运行长度编码

[英]Run length encoding of arbitrary length substrings

What's efficient (in terms of both time complexity) run length encoding algorithm for an arbitrary but finite length of input stream. 对于任意但有限的输入流长度,有效的运行时间编码算法(就时间复杂度而言)。 An algorithm for substrings of length 1 can be implemented in C as: 可以在C中将长度为1的子字符串的算法实现为:

void encoding(char *bytes)
{
    int c = 0; 
    char *s = bytes, ch;

    while(*s) {
       c=1;
       ch=*s;

       while(*s && *s== *(s+1)) {
          c++;
          s++;
       }
       printf("%d%c", c, ch);
       s++;
    }
}

However, I am looking for a better algorithm that can encode substrings of any length. 但是,我正在寻找一种可以编码任意长度的子字符串的更好算法。 For example, for the input "abbabb" the above code will print: "1a2b1a2b" . 例如,对于输入"abbabb" ,以上代码将输出: "1a2b1a2b" But a better algorithm could encode it as "2abb" . 但是更好的算法可以将其编码为"2abb"

The implementation language (C/Python is my choice) is not an issue as an algorithm is all I am looking for. 实现语言(我选择C / Python)不是问题,因为我只想寻找一种算法。

Any algorithm that can find a certain length repeated substring can be used to implement Lempel-Ziv compression with a sliding window of that length. 可以找到一定长度重复子串的任何算法都可以用于执行具有该长度的滑动窗口的Lempel-Ziv压缩。

So I would look into Lempel-Ziv encoders and use that. 因此,我将研究Lempel-Ziv编码器并使用它。 Or even better: drop the run length encoding and implement Lempel-Ziv - it can only provide better compression. 甚至更好:删除运行长度编码并实现Lempel-Ziv-它只能提供更好的压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM