(I edited the question for clarification)
I would appreciate suggestions on how to implement the following in python: given the text
> first
> second
third
fourth
> fifth
> sixth
> seventh
I would like to get two subtexts:
first
second
and
fifth
sixth
seventh
ie given an input of some lines of text, the output should be the blocks of lines which start with >
. A "block" in my definition here is a set of consecutive lines all starting with >
. In the example above since the third line doesn't start with >
it "cuts" the above two lines into a single block. The second block then starts on the first line which starts with >
, ie the fifth line.
I decided to use a brute-force approach to solving the issue. It's not elegant but it works (the code using consecutive_groups
was taken from an answer to this question ):
from more_itertools import consecutive_groups
def get_block_ids(s, sep='>'):
idx = [i for i, line in enumerate(s) if line != '' and line[0] == sep]
idx_grouped = [list(group) for group in consecutive_groups(idx)]
idx_ranges = [(g[0], g[-1]) for g in idx_grouped]
return idx_ranges
The function get_block_ids
returns a list of tuples, each one containing the indices of the first and last line in the respective block found in the string s
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.