简体   繁体   中英

Can I transform this in list comprehension? (or make this code faster?)

This code takes a list of a and b and creates sequences of a and b. In each sequence can be x numbers of a and x numbers of b, but is not possible to have a letter to divide the other (aaabb is allowed but not aaba). Is it possible to transform this in a list comprehension?

 list_of_strings=["abaababbaabaaabbabbab","abbabbbabbbaa"]

final_list=[]
for elt in list_of_strings:
    final_list.append([])
    is_a=0
    for idx in range(1,len(elt)):
        if elt[idx] < elt[idx-1]: #try to find the index where a 'b' is followed by a 'a'
            final_list[-1].append(elt[is_a:idx]) #add the segment on the sublist of final_list. idx (correspond to a new 'a') is not include
            is_a=idx #the begin of next segment is the index of the new 'a'
    final_list[-1].append(elt[is_a:idx+1]) #finish with the lasts 'a' on the string
print(final_list)

Honestly I just need to make it faster, any other tip is welcomed.

I can't import any libreries.

Emulating what your solution does, splitting when "b" changes to "a":

>>> [s.replace('ba', 'b a').split() for s in list_of_strings]
[['ab', 'aab', 'abb', 'aab', 'aaabb', 'abb', 'ab'],
 ['abb', 'abbb', 'abbb', 'aa']]

It seems like a regex might be a better fit here. This looks for one or more a characters followed by zero or more non- a s:

import re

list_of_strings=["abaababbaabaaabbabbab","abbabbbabbbaa"]

[re.findall(r'a+[^a]*', s) for s in list_of_strings]

# [['ab', 'aab', 'abb', 'aab', 'aaabb', 'abb', 'ab'],
#  ['abb', 'abbb', 'abbb', 'aa']]

Edit based on comment

Another way to do this that doesn't involve re is to zip() the strings with themselves at an offset of one. This allows you to avoid the complicated indexing and gives you a sliding window of letter pairs.

list_of_strings=["abaababbaabaaabbabbab","abbabbbabbbaa"]

def get_sets(s):
    if len(s) <= 1:
        yield s
        return
    current = ''
    for m,n in zip(s, s[1:]):
        current += m
        if m == 'b' and n == 'a':
            yield current
            current = ''
    yield current + n


[list(get_sets(s)) for s in list_of_strings]

Same result:

[['ab', 'aab', 'abb', 'aab', 'aaabb', 'abb', 'ab'],
 ['abb', 'abbb', 'abbb', 'aa']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM