![](/img/trans.png)
[英]Python 2 - How do I import a text file containing a long sequence of digits and convert it to a string of individual numbers?
[英]How do I accumulate a sequence of digits in a string and convert them to one number?
我需要將字符串'a3b2'解碼為'aaabb'。 問題是數字是雙倍,三位數。 例如'a10b3'應檢測到該數字不是 1而是10。
我需要開始累積數字。
a = "a12345t5i6o2r43e2"
for i in range(0, len(a)-1):
if a[i].isdigit() is False:
#once i see a letter, i launch a while loop to check how long a digit streak
#after it can be - it's 2,3,4,5 digit number etc
print(a[i])
current_digit_streak = ''
counter = i+1
while a[counter].isdigit(): #this gives index out of range error!
current_digit_streak += a[counter]
counter+=1
如果我將while循環更改為:
while a[counter].isdigit() and counter < ( len(a)-1)
它確實有效,但省略了最后一封信。 我不應該使用正則表達式,只能使用循環。
正則表達式非常適合這里。
import re
pat = re.compile(r"""
(\w) # a word character, followed by...
(\d+) # one or more digits""", flags=re.X)
s = "a12345t5i6o2r43e2"
groups = pat.findall(s)
# [('a', '12345'), ('t', '5'), ('i', '6'), ('o', '2'), ('r', '43'), ('e', '2')]
result = ''.join([lett*int(count) for lett, count in groups])
由於您不能使用正則表達式出於某些未知的原因,我建議使用遞歸函數將字符串拆分為多個部分。
import itertools
def split_into_groups(s):
if not s:
return []
lett, *rest = s
count, rest = int(itertools.takewhile(str.isdigit, rest)), itertools.dropwhile(str.isdigit, rest)
return [(lett, count)] + split_into_groups(rest)
s = "a12345t5i6o2r43e2"
groups = split_into_groups(s)
result = ''.join([lett*count for lett, count in groups])
或者,使用更通用(和功能派生)模式:
def unfold(f, x):
while True:
v, x = f(x)
yield v
def get_group(s):
if not s:
raise StopIteration()
lett, *rest = s
count, rest = int(itertools.takewhile(str.isdigit, rest)), itertools.dropwhile(str.isdigit, rest)
return lett*count, rest
s = "a12345t5i6o2r43e2"
result = ''.join(unfold(get_group, s))
你可以使用groupby :
from itertools import groupby
text = 'a12345t5i6o2r43e2'
groups = [''.join(group) for _, group in groupby(text, key=str.isdigit)]
result = list(zip(groups[::2], groups[1::2]))
print(result)
產量
[('a', '12345'), ('t', '5'), ('i', '6'), ('o', '2'), ('r', '43'), ('e', '2')]
可能的變種之一
import re
def main():
a = "a10t5i6o2r43e2"
items = re.findall(r'(\w)(\d+)', a)
return ''.join([letter*int(count) for letter, count in items])
您for
循環和while
獲得令牌,這就是為什么通過消耗人物循環使用不同的指標while
環再次處理被for
循環。 您應該使用帶有單個索引的while
循環來解析標記:
a = "a12t5i6o2r11e2"
i = 0
char = repeat = output = ''
while i < len(a):
token = a[i]
if token.isdigit():
repeat += token
if char and repeat and (not token.isdigit() or i == len(a) - 1):
output += char * int(repeat)
char = repeat = ''
if not token.isdigit():
char += token
i += 1
print(output)
這輸出:
aaaaaaaaaaaatttttiiiiiioorrrrrrrrrrree
這是使用itertools
模塊的功能解決方案。 您可以使用grouper
從配方itertools
文檔通過第三方或進口more_itertools.grouper
:
from itertools import groupby
from more_itertools import grouper
from operator import itemgetter
a = "a12t5i6o2r11e2"
it = map(''.join, map(itemgetter(1), groupby(a, key=str.isdigit)))
res = ''.join(char*int(count) for char, count in grouper(it, 2))
'aaaaaaaaaaaatttttiiiiiioorrrrrrrrrrree'
供參考, grouper
配方:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
這有點冗長,但它可以按照您的要求工作並使用循環:
def parse_segment(string, index):
for i, letter in enumerate(string[index+1:]):
if letter.isalpha():
return string[index+1:i+index+1]
if i + index + 1 >= len(string) - 1:
return string[index+1:]
def segment_string(string):
num_list = []
for index, letter in enumerate(string):
if letter.isalpha():
num_list.append({'letter': letter, 'number': int(parse_segment(string, index))})
return num_list
def list_2_string(list):
ret_string = ''
for row in list:
ret_string += row['letter'] * row['number']
return ret_string
a = "a12345t5i6o2r43e2"
segmented_string = segment_string(a)
result_string = list_2_string(segmented_string)
print(result_string)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.