Can I transform this in list comprehension? (or make this code faster?)

Question

This code takes a list of a and b and creates sequences of a and b. In each sequence can be x numbers of a and x numbers of b, but is not possible to have a letter to divide the other (aaabb is allowed but not aaba). Is it possible to transform this in a list comprehension?

 list_of_strings=["abaababbaabaaabbabbab","abbabbbabbbaa"]

final_list=[]
for elt in list_of_strings:
    final_list.append([])
    is_a=0
    for idx in range(1,len(elt)):
        if elt[idx] < elt[idx-1]: #try to find the index where a 'b' is followed by a 'a'
            final_list[-1].append(elt[is_a:idx]) #add the segment on the sublist of final_list. idx (correspond to a new 'a') is not include
            is_a=idx #the begin of next segment is the index of the new 'a'
    final_list[-1].append(elt[is_a:idx+1]) #finish with the lasts 'a' on the string
print(final_list)

Honestly I just need to make it faster, any other tip is welcomed.

I can't import any libreries.

Answer 1

Emulating what your solution does, splitting when "b" changes to "a":

>>> [s.replace('ba', 'b a').split() for s in list_of_strings]
[['ab', 'aab', 'abb', 'aab', 'aaabb', 'abb', 'ab'],
 ['abb', 'abbb', 'abbb', 'aa']]

Answer 2

It seems like a regex might be a better fit here. This looks for one or more a characters followed by zero or more non- a s:

import re

list_of_strings=["abaababbaabaaabbabbab","abbabbbabbbaa"]

[re.findall(r'a+[^a]*', s) for s in list_of_strings]

# [['ab', 'aab', 'abb', 'aab', 'aaabb', 'abb', 'ab'],
#  ['abb', 'abbb', 'abbb', 'aa']]

Edit based on comment

Another way to do this that doesn't involve re is to zip() the strings with themselves at an offset of one. This allows you to avoid the complicated indexing and gives you a sliding window of letter pairs.

list_of_strings=["abaababbaabaaabbabbab","abbabbbabbbaa"]

def get_sets(s):
    if len(s) <= 1:
        yield s
        return
    current = ''
    for m,n in zip(s, s[1:]):
        current += m
        if m == 'b' and n == 'a':
            yield current
            current = ''
    yield current + n


[list(get_sets(s)) for s in list_of_strings]

Same result:

[['ab', 'aab', 'abb', 'aab', 'aaabb', 'abb', 'ab'],
 ['abb', 'abbb', 'abbb', 'aa']]

Can I transform this in list comprehension? (or make this code faster?)

Question

2 answers

solution1
2 ACCPTED 2020-11-14 23:50:18

solution2
1 2020-11-14 22:42:12

Can I transform this in list comprehension? (or make this code faster?)

Question

2 answers

solution1 2 ACCPTED 2020-11-14 23:50:18

solution2 1 2020-11-14 22:42:12

solution1
2 ACCPTED 2020-11-14 23:50:18

solution2
1 2020-11-14 22:42:12