简体   繁体   中英

Run Length encoding of symbols

I am trying to write a run length encoding code using python.If a message consist of long sequence of symbols. I am meant to encode it to the as a list of the symbol and the number of times it occurs.This is my code

alphabets = ['a','b','c','d','e','f','g','h','i','j','k',
             'l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
char_count = 0
translate = ''

words = input('Enter your word:  ')

for char in words:
    if char in alphabets:
        char_count += 1
        translate += char + str(char_count)

print(translate)

When I run my program this is what I get.

Enter your word:  abbbbaaabbaaa
a1b2b3b4b5a6a7a8b9b10a11a12a13

The output is actually meant to be.

a1b4a3b2a3

Is there a way to fix this?

You can simply use regular expressions to solve the problem:

import re
translate = re.sub(r"((.)\2*)", lambda x: x.group(2) + str(len(x.group(1))), words)

This regex finds all groups of similar consecutive symbols in the words string and replaces them by its length encoding.

One possible way is to use itertools.groupby :

from itertools import groupby
''.join([f'{letter}{len(list(grouper))}' for letter, grouper in groupby(words)])

Explanation

itertools.groupby splits the string into chunks of same letters, converts each chunk into a pair (letter, grouper) and returns an object generating these pairs:

>>> groupby('abbbbaaabbaaa')
<itertools.groupby at 0x6fffeafa098>

>>> for chunk in groupby('abbbbaaabbaaa'):
        print(chunk)
('a', <itertools._grouper object at 0x6fffeaf2cf8>)
('b', <itertools._grouper object at 0x6fffeae9908>)
('a', <itertools._grouper object at 0x6fffeae9898>)
('b', <itertools._grouper object at 0x6fffeaf2320>)
('a', <itertools._grouper object at 0x6fffeae9898>)

Each itertools._grouper object is again a generator which generates all the letters in the corresponding chunk. By converting it to a list , we can check its length and append it to the result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM