简体   繁体   English

Python中的运行长度编码

[英]Run length encoding in Python

I'm trying to write a simple python algorithm to solve this problem.我正在尝试编写一个简单的 python 算法来解决这个问题。 Can you please help me figure out why my code is not working:你能帮我弄清楚为什么我的代码不起作用:

Problem:问题:

If any character is repeated more than 4 times, the entire set of repeated characters should be replaced with a slash '/', followed by a 2-digit number which is the length of this run of repeated characters, and the character.如果任何字符重复超过 4 次,则整个重复字符集应替换为斜杠“/”,后跟一个 2 位数字,即该重复字符的长度,以及该字符。 For example, "aaaaa" would be encoded as "/05a".例如,“aaaaa”将被编码为“/05a”。 Runs of 4 or less characters should not be replaced since performing the encoding would not decrease the length of the string.不应替换 4 个或更少字符的运行,因为执行编码不会减少字符串的长度。

My Code:我的代码:

def runLengthEncode (plainText):
    res=''
    a=''
    for i in plainText:
        if a.count(i)>0:
            a+=i
        else:
            if len(a)>4:
                res+="/" + str(len(a)) + a[0][:1]
            else:
                res+=a
                a=i
    return(res)

I see many great solutions here but none that feels very pythonic to my eyes.我在这里看到了许多很棒的解决方案,但没有一个让我觉得非常pythonic。 So I'm contributing with a implementation I wrote myself today for this problem.所以我正在为我今天为这个问题编写的一个实现做出贡献。

def run_length_encode(data: str) -> Iterator[Tuple[str, int]]:
    """Returns run length encoded Tuples for string"""
    # A memory efficient (lazy) and pythonic solution using generators
    return ((x, sum(1 for _ in y)) for x, y in groupby(data))

This will return a generator of Tuples with the character and number of instances, but can easily be modified to return a string as well.这将返回一个带有字符和实例数的元组生成器,但也可以很容易地修改为返回一个字符串。 A benefit of doing it this way is that it's all lazy evaluated and won't consume more memory or cpu than needed if you don't need to exhaust the entire search space.这样做的一个好处是,如果您不需要用尽整个搜索空间,那么它都是惰性计算的,并且不会消耗比所需更多的内存或 cpu。

If you still want string encoding the code can quite easily be modified for that use case like this:如果您仍然想要字符串编码,可以很容易地为该用例修改代码,如下所示:

def run_length_encode(data: str) -> str:
    """Returns run length encoded string for data"""
    # A memory efficient (lazy) and pythonic solution using generators
    return "".join(f"{x}{sum(1 for _ in y)}" for x, y in groupby(data))

This is a more generic run length encoding for all lengths, and not just for those of over 4 characters.这是适用于所有长度的更通用的运行长度编码,而不仅仅是超过 4 个字符的编码。 But this could also quite easily be adapted with a conditional for the string if wanted.但是,如果需要,这也可以很容易地通过字符串的条件进行调整。

Aside for setting a=i after encoding a sequence and setting a width for your int when printed into the string.除了在编码序列后设置a=i并在打印到字符串中时为 int 设置宽度。 You could also do the following which takes advantage of pythons groupby .您还可以执行以下操作,以利用groupby Its also a good idea to use format when constructing strings.在构造字符串时使用format也是一个好主意。

from itertools import groupby

def runLengthEncode (plainText):
    res = []

    for k,i in groupby(plainText):
        run = list(i)
        if(len(run) > 4):
            res.append("/{:02}{}".format(len(run), k))
        else:
            res.extend(run)

    return "".join(res)

Rosetta Code has a lot of implementations , that should easily be adaptable to your usecase. Rosetta Code 有很多实现,应该很容易适应您的用例。

Here is Python code with regular expressions:这是带有正则表达式的 Python 代码:

from re import sub

def encode(text):
    '''
    Doctest:
        >>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW')
        '12W1B12W3B24W1B14W'    
    '''
    return sub(r'(.)\1*', lambda m: str(len(m.group(0))) + m.group(1),
               text)

def decode(text):
    '''
    Doctest:
        >>> decode('12W1B12W3B24W1B14W')
        'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'
    '''
    return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),
               text)

textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
assert decode(encode(textin)) == textin

Just observe the behaviour:只需观察行为:

>>> runLengthEncode("abcd")
'abc'

Last character is ignored.最后一个字符被忽略。 You have to append what you've collected.您必须附加您收集的内容。

>>> runLengthEncode("abbbbbcd")
'a/5b/5b'

Oops, problem after encoding.哎呀,编码后的问题。 You should set a=i even if you found a long enough sequence.即使找到足够长的序列,也应该设置a=i

You can use the groupby() function combined with a list/generator comprehension:您可以将groupby()函数与列表/生成器理解结合使用:

from itertools import groupby, imap

''.join(x if reps <= 4 else "/%02d%s" % (reps, x) for x, reps in imap(lambda x: (x[0], len(list(x[1]))), groupby(s)))
Split=(list(input("Enter string: ")))
Split.append("")
a = 0
for i in range(len(Split)):
    try:
        if (Split[i] in Split) >0:
            a = a + 1
        if Split[i] != Split[i+1]:
            print(Split[i],a)
            a = 0
    except IndexError:
        print()

this is much easier and works everytime这更容易并且每次都有效

def RLE_comp_encode(text):
    if text == text[0]*len(text) :
        return str(len(text))+text[0]
    else:
        comp_text , r = '' , 1
        for i in range (1,len(text)):
            if text[i]==text[i-1]:
                r +=1
                if i == len(text)-1:
                    comp_text += str(r)+text[i]
            else :
                comp_text += str(r)+text[i-1]
                r = 1
    return comp_text

This worked for me,这对我有用,

I know this is not the most efficient solution, but we haven't studied functions like groupby() yet so here's what I did:我知道这不是最有效的解决方案,但我们还没有研究过groupby()类的函数,所以这是我所做的:

def runLengthEncode (plainText):
    res=''
    a=''
    count = 0
    for i in plainText:
        count+=1
        if a.count(i)>0:
            a+=i
        else:
            if len(a)>4:
                if len(a)<10:
                    res+="/0"+str(len(a))+a[0][:1]
                else:
                    res+="/" + str(len(a)) + a[0][:1]
                a=i
            else:
                res+=a
                a=i
        if count == len(plainText):
            if len(a)>4:
                if len(a)<10:
                    res+="/0"+str(len(a))+a[0][:1]
                else:
                    res+="/" + str(len(a)) + a[0][:1]
            else:
                res+=a
    return(res)
text=input("Please enter the string to encode")
encoded=[]
index=0
amount=1
while index<=(len(text)-1):  
  if index==(len(text)-1) or text[index]!=text[(index+1)]:
    encoded.append((text[index],amount))        
    amount=1
  else:
    amount=amount+1            
  index=index+1   
print(encoded)

An easy solution to run-length encoding which I can think of:我能想到的游程编码的简单解决方案:

For encoding a string like "a4b5c6d7..." :对像"a4b5c6d7..."这样的字符串进行编码:

def encode(s):
    counts = {}
    for c in s:
        if counts.get(c) is None:
            counts[c] = s.count(c)
    return "".join(k+str(v) for k,v in counts.items())

For decoding a string like "aaaaaabbbdddddccccc...." :解码像"aaaaaabbbdddddccccc...."这样的字符串:

def decode(s):
    return "".join((map(lambda tup:  tup[0] * int(tup[1]), zip(s[0:len(s):2], s[1:len(s):2]))))

Fairly easy to read and simple.相当容易阅读和简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM