简体   繁体   中英

Finding block of lines starting with a specific character

(I edited the question for clarification)

I would appreciate suggestions on how to implement the following in python: given the text

> first
> second
third
fourth
> fifth
> sixth
> seventh

I would like to get two subtexts:

first
second

and

fifth
sixth
seventh

ie given an input of some lines of text, the output should be the blocks of lines which start with > . A "block" in my definition here is a set of consecutive lines all starting with > . In the example above since the third line doesn't start with > it "cuts" the above two lines into a single block. The second block then starts on the first line which starts with > , ie the fifth line.

I decided to use a brute-force approach to solving the issue. It's not elegant but it works (the code using consecutive_groups was taken from an answer to this question ):

from more_itertools import consecutive_groups

def get_block_ids(s, sep='>'):
    idx = [i for i, line in enumerate(s) if line != '' and line[0] == sep]
    idx_grouped = [list(group) for group in consecutive_groups(idx)]
    idx_ranges = [(g[0], g[-1]) for g in idx_grouped]
    return idx_ranges

The function get_block_ids returns a list of tuples, each one containing the indices of the first and last line in the respective block found in the string s .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM