简体   繁体   English

如何在python中计算字符串中的重复字符

[英]how to count repeated characters in a string in python

Write a python function which performs the run length encoding for a given String and returns the run length encoded String. 编写一个python函数,该函数对给定的String执行游程长度编码,并返回经过游程长度编码的String。

I tried it using looping but couldn't get the expected output. 我尝试使用循环,但无法获得预期的输出。

def encode(message):    
    #Remove pass and write your logic here
    count=0
    encoded_message=[]
    for char in range(0,len(message)-1,1):
        count=1
        while(message[char]==message[char+1]):

             count=count+1;
             char=char+1
        encoded_message.append(str(count)+message[char])

    return encoded_message

encoded_message=encode("ABBBBCCCCCCCCAB")
print(' '.join(encoded_message))

expected output is 1A4B8C1A1B . 预期输出为1A4B8C1A1B what I got is 1A 4B 3B 2B 1B 8C 7C 6C 5C 4C 3C 2C 1C 1A 我得到的是1A 4B 3B 2B 1B 8C 7C 6C 5C 4C 3C 2C 1C 1A

You can use groupby from itertools module : 您可以从itertools模块使用groupby

s = "ABBBBCCCCCCCCAB"
from itertools import groupby
expected = ''.join([str(len(list(v)))+k for k,v in groupby(s)])

Output : 输出

'1A4B8C1A1B'

groupby(s) returns a itertools.groupby object. groupby(s)返回itertools.groupby对象。 A list comprehension on this object like [(k,list(v)) for k,v in groupby(s)] returns us this in ordered way : 关于此对象的列表理解,例如[(k,list(v)) for k,v in groupby(s)]我们返回此命令:

[('A', ['A']), ('B', ['B', 'B', 'B', 'B']), ('C', ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']), ('A', ['A']), ('B', ['B'])]

We can just count the number of sub-items in the second-item of the tuple and add its string format before the first item of the tuple and join all of them. 我们可以只计算第二个元组中子项目的数量,然后在第一个元组之前添加其字符串格式,然后将所有这些子元合并。

Update : You are trying to change the iteration index in the loop by doing char=char+1 but it doesn't change the iteration index ie the loop doesn't pass for the next 2 or 3 or 4 iterations. 更新 :您正在尝试通过执行char=char+1来更改循环中的迭代索引,但它不会更改迭代索引,即,在接下来的2或3或4个迭代中循环不会通过。 Add these two print lines in your code and you would see that the char variable you're trying to increase while looping is not simply the iteration index : 在代码中添加这两条打印行,您会发现要在循环时尝试增加的char变量不仅仅是迭代索引:

...
for char in range(0,len(message)-1,1):
        print('\tchar at first line : ', char, 'char id now : ', id(char))
        count=1
        while(message[char]==message[char+1]):
            count=count+1
            char=char+1
            print('char now : ', char, 'char id now : ', id(char))
            ...

It should output something like : 它应该输出类似:

    char at first line :  1 char id now :  11197408
char now :  2 char id now :  11197440
char now :  3 char id now :  11197472
char now :  4 char id now :  11197504

See, how the id of each time char got changed. 看,每次charid如何更改。

You can also use re module for encoding the string: 您还可以使用re模块对字符串进行编码:

s = 'ABBBBCCCCCCCCAB'

import re

l = ''.join(str(len(c2)+1) + c1 for c1, c2 in re.findall(r'([A-Z])(\1*)', s))

print(l)

Prints: 印刷品:

1A4B8C1A1B
def func(string):
    string +='@'
    dic = []
    tmp =[]
    tmp += [string[0]]

    for i in range(1,len(string)):

        if string[i]==string[i-1]:
            tmp.append(string[i])
        else:
            dic.append(tmp)
            tmp=[]
            tmp.append(string[i])
    res = ''.join(['{}{}'.format(len(i),i[0]) for i in dic])
    return res

string = 'ABBBBCCCCCCCCAB'         
solution = func(string)

print(solution)

output 输出

1A4B8C1A1B

Use this logic, it will return you a dictionary with frequency of each letter. 使用此逻辑,它将为您返回每个字母出现频率的字典。

s = "ABBBBCCCCCCCCAB"
d = {i:0 for i in s}
for i in s:
    d[i] += 1
print(d)

**output:-**
{'A': 2, 'B': 5, 'C': 8}

If you want to fix your function, here is fixed variant: 如果要修复功能,则为固定的变体:

def encode(message):
    result = []
    i = count = 0
    while i < len(message) - 1:
        count = 1
        while i + count < len(message) and message[i + count - 1] == message[i + count]:
            count += 1
        i += count
        result.append("{}{}".format(count, message[i - 1]))
    if count == 1:
        result.append("1" + message[-1])
    return result

What's changed: 更改内容:

  1. for loop replaced with while . for循环替换为while Why? 为什么? Cause you need to jump over indexes incide loop. 因为您需要跳过索引incide循环。 range(0,len(message)-1,1) returns you list [0, 1, 2, ...] and it doesn't matter what you do with char variable incide loop, it won't affect next iteration. range(0,len(message)-1,1)返回列表[0, 1, 2, ...]而与char变量incide循环无关紧要,不会影响下一次迭代。 To have a possibility skip some indexes I used while loop with predefined ( i = count = 0 ) index and count variables. 为了有可能跳过一些我在while循环中使用的具有预定义( i = count = 0 )索引和计数变量的索引。
  2. Changed conditions of internal while loop. 内部while循环的条件已更改。 Now there're two conditions: 现在有两个条件:
    • message[i + count - 1] == message[i + count] - check if next symbol same with current; message[i + count - 1] == message[i + count] -检查下一个符号是否与当前符号相同;
    • i + count < len(message) - prevent intenal loop from accessing index out of range. i + count < len(message) -防止内部循环访问索引超出范围。
  3. Updating "main" index ( i ) outside of internal loop. 在内部循环之外更新“主”索引( i )。
  4. if count == 1: added post condition after loop execution to not miss last character in case if it's single. if count == 1:在循环执行后添加了后置条件,以防丢失最后一个字符(如果是单个字符)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM