游程编码为所有重复值赋予相同的数字

Question

我正在为混合不同压缩算法的短字符串构建压缩器，而RLE就是其中之一，这给了问题。

我现在拥有的脚本如下，目前还很不完整：

# -*- coding: utf-8 -*-

import re

dictionary = {'hello':'\§', 'world':'\°', 'the': '\@', 'for': '\]'}
a_test_string = 'hello******** to the world****!'

def compress(string, dictionary):
    pattern = re.compile( '|'.join(dictionary.keys() )) 
    result = pattern.sub(lambda value: dictionary[value.group() ], string)

    '''
    Here I should also implement a snippet to check for characters beginning with "\" so that they won't get replaced and screw up the result.
    '''

    for character in string:
        occurrence = string.count(character*2)
        there_is_more_than_one_occurrence = occurrence > 1

        if there_is_more_than_one_occurrence:

                second_regex_pass_for_multiple_occurrences = re.sub('\*\*\*+', '/'+character+str(occurrence), result)
                result = second_regex_pass_for_multiple_occurrences

    print 'Original string:', string

    print 'Compressed string:', result

    print 'Original size:', len(string)

    print 'Compressed size:', len(result)


compress(a_test_string, dictionary)

当我运行该函数时，我得到以下信息：

Original string: hello******** to the world****!
Compressed string: \§/*6 to \@ \°/*6!
Original size: 31
Compressed size: 20

但我应该得到：

Original string: hello******** to the world****!
Compressed string: \§/*8 to \@ \°/*4!
Original size: 31
Compressed size: 20

我在这里做错了什么，我都得到了6个重复字符？

Answer 1

我不会试图确切地了解您在做什么，但是一个好的调试方法是在for循环内添加一些“打印”语句，或者使用python调试器来查看实际发生的情况。 尝试自己运行其中一些调用，然后查看返回的内容。

我认为您的主要问题是“ string.count”返回整个字符串的计数，因此，当它第一次检查2 * s时，它会看到所有12个（或者从技术上讲，是**所有6个模式）。 当for循环检查下一组*它仍在检查整个字符串。 希望这可以帮助。

游程编码为所有重复值赋予相同的数字

问题描述

1 个解决方案

解决方案1
0 2014-06-09 15:16:39

游程编码为所有重复值赋予相同的数字

问题描述

1 个解决方案

解决方案1 0 2014-06-09 15:16:39

解决方案1
0 2014-06-09 15:16:39