[英]how to count repeated characters in a string in python
Write a python function which performs the run length encoding for a given String and returns the run length encoded String. 编写一个python函数,该函数对给定的String执行游程长度编码,并返回经过游程长度编码的String。
I tried it using looping but couldn't get the expected output. 我尝试使用循环,但无法获得预期的输出。
def encode(message):
#Remove pass and write your logic here
count=0
encoded_message=[]
for char in range(0,len(message)-1,1):
count=1
while(message[char]==message[char+1]):
count=count+1;
char=char+1
encoded_message.append(str(count)+message[char])
return encoded_message
encoded_message=encode("ABBBBCCCCCCCCAB")
print(' '.join(encoded_message))
expected output is 1A4B8C1A1B
. 预期输出为1A4B8C1A1B
。 what I got is 1A 4B 3B 2B 1B 8C 7C 6C 5C 4C 3C 2C 1C 1A
我得到的是1A 4B 3B 2B 1B 8C 7C 6C 5C 4C 3C 2C 1C 1A
You can use groupby
from itertools
module : 您可以从itertools
模块使用groupby
:
s = "ABBBBCCCCCCCCAB"
from itertools import groupby
expected = ''.join([str(len(list(v)))+k for k,v in groupby(s)])
Output : 输出 :
'1A4B8C1A1B'
groupby(s)
returns a itertools.groupby
object. groupby(s)
返回itertools.groupby
对象。 A list comprehension on this object like [(k,list(v)) for k,v in groupby(s)]
returns us this in ordered way : 关于此对象的列表理解,例如[(k,list(v)) for k,v in groupby(s)]
我们返回此命令:
[('A', ['A']), ('B', ['B', 'B', 'B', 'B']), ('C', ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']), ('A', ['A']), ('B', ['B'])]
We can just count the number of sub-items in the second-item of the tuple and add its string format before the first item of the tuple and join all of them. 我们可以只计算第二个元组中子项目的数量,然后在第一个元组之前添加其字符串格式,然后将所有这些子元合并。
Update : You are trying to change the iteration index in the loop by doing char=char+1
but it doesn't change the iteration index ie the loop doesn't pass for the next 2 or 3 or 4 iterations. 更新 :您正在尝试通过执行char=char+1
来更改循环中的迭代索引,但它不会更改迭代索引,即,在接下来的2或3或4个迭代中循环不会通过。 Add these two print lines in your code and you would see that the char
variable you're trying to increase while looping is not simply the iteration index : 在代码中添加这两条打印行,您会发现要在循环时尝试增加的char
变量不仅仅是迭代索引:
...
for char in range(0,len(message)-1,1):
print('\tchar at first line : ', char, 'char id now : ', id(char))
count=1
while(message[char]==message[char+1]):
count=count+1
char=char+1
print('char now : ', char, 'char id now : ', id(char))
...
It should output something like : 它应该输出类似:
char at first line : 1 char id now : 11197408
char now : 2 char id now : 11197440
char now : 3 char id now : 11197472
char now : 4 char id now : 11197504
See, how the id
of each time char
got changed. 看,每次char
的id
如何更改。
You can also use re
module for encoding the string: 您还可以使用re
模块对字符串进行编码:
s = 'ABBBBCCCCCCCCAB'
import re
l = ''.join(str(len(c2)+1) + c1 for c1, c2 in re.findall(r'([A-Z])(\1*)', s))
print(l)
Prints: 印刷品:
1A4B8C1A1B
def func(string):
string +='@'
dic = []
tmp =[]
tmp += [string[0]]
for i in range(1,len(string)):
if string[i]==string[i-1]:
tmp.append(string[i])
else:
dic.append(tmp)
tmp=[]
tmp.append(string[i])
res = ''.join(['{}{}'.format(len(i),i[0]) for i in dic])
return res
string = 'ABBBBCCCCCCCCAB'
solution = func(string)
print(solution)
output 输出
1A4B8C1A1B
Use this logic, it will return you a dictionary with frequency of each letter. 使用此逻辑,它将为您返回每个字母出现频率的字典。
s = "ABBBBCCCCCCCCAB"
d = {i:0 for i in s}
for i in s:
d[i] += 1
print(d)
**output:-**
{'A': 2, 'B': 5, 'C': 8}
If you want to fix your function, here is fixed variant: 如果要修复功能,则为固定的变体:
def encode(message):
result = []
i = count = 0
while i < len(message) - 1:
count = 1
while i + count < len(message) and message[i + count - 1] == message[i + count]:
count += 1
i += count
result.append("{}{}".format(count, message[i - 1]))
if count == 1:
result.append("1" + message[-1])
return result
What's changed: 更改内容:
range(0,len(message)-1,1)
returns you list [0, 1, 2, ...]
and it doesn't matter what you do with char
variable incide loop, it won't affect next iteration. range(0,len(message)-1,1)
返回列表[0, 1, 2, ...]
而与char
变量incide循环无关紧要,不会影响下一次迭代。 To have a possibility skip some indexes I used while loop with predefined ( i = count = 0
) index and count variables. 为了有可能跳过一些我在while循环中使用的具有预定义( i = count = 0
)索引和计数变量的索引。 message[i + count - 1] == message[i + count]
- check if next symbol same with current; message[i + count - 1] == message[i + count]
-检查下一个符号是否与当前符号相同; i + count < len(message)
- prevent intenal loop from accessing index out of range. i + count < len(message)
-防止内部循环访问索引超出范围。 i
) outside of internal loop. 在内部循环之外更新“主”索引( i
)。 if count == 1:
added post condition after loop execution to not miss last character in case if it's single. if count == 1:
在循环执行后添加了后置条件,以防丢失最后一个字符(如果是单个字符)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.