How to create sublists from list based on start and end elements?

Question

Trying to create sublists from list based on start and end elements. I am not able to get all occurrences of start and end elements

lst  = ['value0','<!program start>','value1','value2','<!program end>',
        'value3','<!program start>','value4','<!program end>','value5']

Expected output:

[['value0'],['<!program start>','value1','value2','<!program end>'],
 ['value3'],['<!program start>','value4','<!program end>'],['value5']]

Code:

start_idx = lst.index('<!program start>')
end_idx = lst.index('<!program end>')
final_result = lst[:start_idx] + [lst[start_idx:end_idx+1]] + lst[end_idx+1:]
print(final_result)

Answer 1

You could process the data with a relatively simple FSM ( Finite State Machine ):

def fsm(lst):
    result = []

    state = 0
    for elem in lst:
        if state == 0:
            result.append([elem])
            state = 1
        elif state == 1:
            if elem == '<!program start>':
                subl = [elem]
                state = 2
            else:
                break  # End of pattern.
        elif state == 2:
            subl.append(elem)
            if elem == '<!program end>':
                result.append(subl)
                state = 0

    return result


lst  = ['value0','<!program start>','value1','value2','<!program end>',
        'value3','<!program start>','value4','<!program end>','value5']

print(fsm(lst))

Answer 2

Using iteration :

lst = ['value0', '<!program start>', 'value1', 'value2', '<!program end>',
       'value3', '<!program start>', 'value4', '<!program end>', 'value5']

res = []
start = False
temp = []

for item in lst:
    if item == '<!program start>':
        start = True
        temp.append(item)

    elif item == '<!program end>':
        start = False
        temp.append(item)
        res.append(temp)
        temp = []

    elif start:
        temp.append(item)
    else:
        res.append([item])

print(res)

output:

[['value0'], ['<!program start>', 'value1', 'value2', '<!program end>'], ['value3'], ['<!program start>', 'value4', '<!program end>'], ['value5']]

By start flag I handled whether the item is in the middle of the starting and closing tag or not.

Answer 3

It's not as cool as your one-liner, but it looks like it works:

def process(input_list, start, end):
    output = []
    while len(input_list) != 0:
        if input_list[0] != start:
            # This isn't a start token, so just add it to the output
            output.append([input_list[0]])
            input_list = input_list[1:]
            continue

        # Looks like we've found a start token, look for the end
        # associated with it and append that. NOTE: You could
        # try/except here if you didn't know that the end token was
        # actually there.
        end_index = input_list.index(end)
        output.append(input_list[:end_index + 1])
        input_list = input_list[end_index + 1:]
    return output

I get:

[['value0'],
 ['<!program start>', 'value1', 'value2', '<!program end>'],
 ['value3'],
 ['<!program start>', 'value4', '<!program end>'],
 ['value5']]

as the output which looks right to me

Answer 4

The problem with your code is index returns the first seen index and not all the indexes. but can be done simply using while loops.

final_list = []
i = 0
while i < len(lst):
    inner_list = []
    word = lst[i]
    if word == "<!program start>":
        while word != '<!program end>':                  
            word = lst[i]
            inner_list.append(word)
            i += 1    
    else:
        inner_list.append(word)
        i += 1
    final_list.append(inner_list)

print(final_list)

Answer 5

Similar solution with nested while loops.

test_list = ['value0','<!program start>','value1','value2','<!program end>',
        'value3','<!program start>','value4','<!program end>','value5']

answer_list = []
i = 0
while i < len(test_list):
    if test_list[i] == '<!program start>':
        sublist = []
        while test_list[i] != '<!program end>':
            sublist.append(test_list[i])
            i += 1
    elif test_list[i] == '<!program end>':
        sublist.append(test_list[i])
        answer_list.append(sublist)
        i += 1
    else:
        answer_list.append(test_list[i])
        i += 1

print(answer_list)

Produces:

['value0', ['<!program start>', 'value1', 'value2', '<!program end>'], 'value3', ['<!program start>', 'value4', '<!program end>'], 'value5']

Answer 6

Actually there are some interesting comprehensive approaches that make use of basic str and list processing.

For instance, you could first separate your lst into chunks based on a generalized substring of the start and end tags:

chunks = [s for s in " ".join(lst).split("<!program ")]

These chunks inherently contain the features that distinguish between single elements and elements between tags.

A list comp is a nice and elegant way of obtaining the desired output:

output = [[s.strip('end> ')] if not s.startswith('start>') else ["<!program start>"] + s.strip("start> ").split() + ["<!program end>"] for s in chunks]

How to create sublists from list based on start and end elements?

Question

6 answers

solution1
2 2021-07-12 21:43:17

solution2
1 ACCPTED 2021-07-12 21:22:09

solution3
0 2021-07-12 21:24:13

solution4
0 2021-07-12 21:36:33

solution5
0 2021-07-12 21:50:15

solution6
0 2021-07-12 22:23:18

How to create sublists from list based on start and end elements?

Question

6 answers

solution1 2 2021-07-12 21:43:17

solution2 1 ACCPTED 2021-07-12 21:22:09

solution3 0 2021-07-12 21:24:13

solution4 0 2021-07-12 21:36:33

solution5 0 2021-07-12 21:50:15

solution6 0 2021-07-12 22:23:18

solution1
2 2021-07-12 21:43:17

solution2
1 ACCPTED 2021-07-12 21:22:09

solution3
0 2021-07-12 21:24:13

solution4
0 2021-07-12 21:36:33

solution5
0 2021-07-12 21:50:15

solution6
0 2021-07-12 22:23:18