简体   繁体   中英

How to create sublists from list based on start and end elements?

Trying to create sublists from list based on start and end elements. I am not able to get all occurrences of start and end elements

lst  = ['value0','<!program start>','value1','value2','<!program end>',
        'value3','<!program start>','value4','<!program end>','value5']

Expected output:

[['value0'],['<!program start>','value1','value2','<!program end>'],
 ['value3'],['<!program start>','value4','<!program end>'],['value5']]

Code:

start_idx = lst.index('<!program start>')
end_idx = lst.index('<!program end>')
final_result = lst[:start_idx] + [lst[start_idx:end_idx+1]] + lst[end_idx+1:]
print(final_result)

You could process the data with a relatively simple FSM ( Finite State Machine ):

def fsm(lst):
    result = []

    state = 0
    for elem in lst:
        if state == 0:
            result.append([elem])
            state = 1
        elif state == 1:
            if elem == '<!program start>':
                subl = [elem]
                state = 2
            else:
                break  # End of pattern.
        elif state == 2:
            subl.append(elem)
            if elem == '<!program end>':
                result.append(subl)
                state = 0

    return result


lst  = ['value0','<!program start>','value1','value2','<!program end>',
        'value3','<!program start>','value4','<!program end>','value5']

print(fsm(lst))

Using iteration :

lst = ['value0', '<!program start>', 'value1', 'value2', '<!program end>',
       'value3', '<!program start>', 'value4', '<!program end>', 'value5']

res = []
start = False
temp = []

for item in lst:
    if item == '<!program start>':
        start = True
        temp.append(item)

    elif item == '<!program end>':
        start = False
        temp.append(item)
        res.append(temp)
        temp = []

    elif start:
        temp.append(item)
    else:
        res.append([item])

print(res)

output:

[['value0'], ['<!program start>', 'value1', 'value2', '<!program end>'], ['value3'], ['<!program start>', 'value4', '<!program end>'], ['value5']]

By start flag I handled whether the item is in the middle of the starting and closing tag or not.

It's not as cool as your one-liner, but it looks like it works:

def process(input_list, start, end):
    output = []
    while len(input_list) != 0:
        if input_list[0] != start:
            # This isn't a start token, so just add it to the output
            output.append([input_list[0]])
            input_list = input_list[1:]
            continue

        # Looks like we've found a start token, look for the end
        # associated with it and append that. NOTE: You could
        # try/except here if you didn't know that the end token was
        # actually there.
        end_index = input_list.index(end)
        output.append(input_list[:end_index + 1])
        input_list = input_list[end_index + 1:]
    return output

I get:

[['value0'],
 ['<!program start>', 'value1', 'value2', '<!program end>'],
 ['value3'],
 ['<!program start>', 'value4', '<!program end>'],
 ['value5']]

as the output which looks right to me

The problem with your code is index returns the first seen index and not all the indexes. but can be done simply using while loops.

final_list = []
i = 0
while i < len(lst):
    inner_list = []
    word = lst[i]
    if word == "<!program start>":
        while word != '<!program end>':                  
            word = lst[i]
            inner_list.append(word)
            i += 1    
    else:
        inner_list.append(word)
        i += 1
    final_list.append(inner_list)

print(final_list)

Similar solution with nested while loops.

test_list = ['value0','<!program start>','value1','value2','<!program end>',
        'value3','<!program start>','value4','<!program end>','value5']

answer_list = []
i = 0
while i < len(test_list):
    if test_list[i] == '<!program start>':
        sublist = []
        while test_list[i] != '<!program end>':
            sublist.append(test_list[i])
            i += 1
    elif test_list[i] == '<!program end>':
        sublist.append(test_list[i])
        answer_list.append(sublist)
        i += 1
    else:
        answer_list.append(test_list[i])
        i += 1

print(answer_list)

Produces:

['value0', ['<!program start>', 'value1', 'value2', '<!program end>'], 'value3', ['<!program start>', 'value4', '<!program end>'], 'value5']

Actually there are some interesting comprehensive approaches that make use of basic str and list processing.

For instance, you could first separate your lst into chunks based on a generalized substring of the start and end tags:

chunks = [s for s in " ".join(lst).split("<!program ")]

These chunks inherently contain the features that distinguish between single elements and elements between tags.

A list comp is a nice and elegant way of obtaining the desired output:

output = [[s.strip('end> ')] if not s.startswith('start>') else ["<!program start>"] + s.strip("start> ").split() + ["<!program end>"] for s in chunks]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM