简体   繁体   中英

how to count repeated characters in a string in python

Write a python function which performs the run length encoding for a given String and returns the run length encoded String.

I tried it using looping but couldn't get the expected output.

def encode(message):    
    #Remove pass and write your logic here
    count=0
    encoded_message=[]
    for char in range(0,len(message)-1,1):
        count=1
        while(message[char]==message[char+1]):

             count=count+1;
             char=char+1
        encoded_message.append(str(count)+message[char])

    return encoded_message

encoded_message=encode("ABBBBCCCCCCCCAB")
print(' '.join(encoded_message))

expected output is 1A4B8C1A1B . what I got is 1A 4B 3B 2B 1B 8C 7C 6C 5C 4C 3C 2C 1C 1A

You can use groupby from itertools module :

s = "ABBBBCCCCCCCCAB"
from itertools import groupby
expected = ''.join([str(len(list(v)))+k for k,v in groupby(s)])

Output :

'1A4B8C1A1B'

groupby(s) returns a itertools.groupby object. A list comprehension on this object like [(k,list(v)) for k,v in groupby(s)] returns us this in ordered way :

[('A', ['A']), ('B', ['B', 'B', 'B', 'B']), ('C', ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']), ('A', ['A']), ('B', ['B'])]

We can just count the number of sub-items in the second-item of the tuple and add its string format before the first item of the tuple and join all of them.

Update : You are trying to change the iteration index in the loop by doing char=char+1 but it doesn't change the iteration index ie the loop doesn't pass for the next 2 or 3 or 4 iterations. Add these two print lines in your code and you would see that the char variable you're trying to increase while looping is not simply the iteration index :

...
for char in range(0,len(message)-1,1):
        print('\tchar at first line : ', char, 'char id now : ', id(char))
        count=1
        while(message[char]==message[char+1]):
            count=count+1
            char=char+1
            print('char now : ', char, 'char id now : ', id(char))
            ...

It should output something like :

    char at first line :  1 char id now :  11197408
char now :  2 char id now :  11197440
char now :  3 char id now :  11197472
char now :  4 char id now :  11197504

See, how the id of each time char got changed.

You can also use re module for encoding the string:

s = 'ABBBBCCCCCCCCAB'

import re

l = ''.join(str(len(c2)+1) + c1 for c1, c2 in re.findall(r'([A-Z])(\1*)', s))

print(l)

Prints:

1A4B8C1A1B
def func(string):
    string +='@'
    dic = []
    tmp =[]
    tmp += [string[0]]

    for i in range(1,len(string)):

        if string[i]==string[i-1]:
            tmp.append(string[i])
        else:
            dic.append(tmp)
            tmp=[]
            tmp.append(string[i])
    res = ''.join(['{}{}'.format(len(i),i[0]) for i in dic])
    return res

string = 'ABBBBCCCCCCCCAB'         
solution = func(string)

print(solution)

output

1A4B8C1A1B

Use this logic, it will return you a dictionary with frequency of each letter.

s = "ABBBBCCCCCCCCAB"
d = {i:0 for i in s}
for i in s:
    d[i] += 1
print(d)

**output:-**
{'A': 2, 'B': 5, 'C': 8}

If you want to fix your function, here is fixed variant:

def encode(message):
    result = []
    i = count = 0
    while i < len(message) - 1:
        count = 1
        while i + count < len(message) and message[i + count - 1] == message[i + count]:
            count += 1
        i += count
        result.append("{}{}".format(count, message[i - 1]))
    if count == 1:
        result.append("1" + message[-1])
    return result

What's changed:

  1. for loop replaced with while . Why? Cause you need to jump over indexes incide loop. range(0,len(message)-1,1) returns you list [0, 1, 2, ...] and it doesn't matter what you do with char variable incide loop, it won't affect next iteration. To have a possibility skip some indexes I used while loop with predefined ( i = count = 0 ) index and count variables.
  2. Changed conditions of internal while loop. Now there're two conditions:
    • message[i + count - 1] == message[i + count] - check if next symbol same with current;
    • i + count < len(message) - prevent intenal loop from accessing index out of range.
  3. Updating "main" index ( i ) outside of internal loop.
  4. if count == 1: added post condition after loop execution to not miss last character in case if it's single.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM