I have the following string:
string = 'TAA15=ATT'
I make a list out of it:
string_list = list(string)
print(string_list)
and the result is:
['T', 'A', 'A', '1', '5','=', 'A', 'T', 'T']
I need to detect subsequent digits and join them into a single number, as shown below:
['T', 'A', 'A', '15','=', 'A', 'T', 'T']
I'm also quite concerned with performances. This string conversion is done thousand times.
Thank you for any hints you can provide.
Here is a very short solution
import re
def digitsMerger(source):
return re.findall(r'\d+|.', source)
digitsMerger('TAA15=ATT')
['T', 'A', 'A', '15', '=', 'A', 'T', 'T']
Using itertools.groupby
Ex:
from itertools import groupby
string = 'TAA15=ATT'
result = []
for k, v in groupby(string, str.isdigit):
if k:
result.append("".join(v))
else:
result.extend(v)
print(result)
Output:
['T', 'A', 'A', '15', '=', 'A', 'T', 'T']
Another regexp:
import re
s = 'TAA15=ATT'
pattern = r'\d+|\D'
m = re.findall(pattern, s)
print(m)
You can use regular expressions, in Python the library re
:
import re
string = 'TAA15=ATT'
num = re.sub('[^0-9,]', "", string)
pos = string.find(num)
str2 = re.sub('\\d+',"", string)
str2 = re.sub('=',"", str2)
print(str2)
l = list()
for el in str2:
l.append(el)
l.insert(pos, num)
print(l)
Basically re.sub('[^0-9,]', "", string)
is telling: take the string, match all the characters that are not ( ^
means negation) numbers ( 0-9
) and substitute them with the second parameter, ie., an empty string. So basically what's left are only digits that you have to convert to an integer.
If the =
is always after the digit instead of
str2 = re.sub('\\d+',"", string)
str2 = re.sub('=',"", str2)
you can do
str2 = re.sub('\\d+=',"", string)
You can create a function that compares the last value seen and the next and use functools.reduce
:
from functools import reduce
string_list = ['T', 'A', 'A', '1', '5', 'A', 'T', 'T']
def combine_nums(lst, nxt):
if lst and all(map(str.isdigit, (lst[-1], nxt))):
nxt = lst[-1] + nxt
return lst + [nxt]
print(reduce(combine_nums, string_list, [])
Results:
['T', 'A', 'A', '1', '15', 'A', 'T', 'T']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.