Extracting lines of a text file between delimiters into a list Python

Question

I have a large text file with lines in this format:

DELIM
filename1
information
information
DELIM
filename2
information
information
information
information
DELIM

and so on, where the amount of data in between the delimiters varies. How do I go about writing everything between the delimiters as a list?

Answer 1

Provided that DELIM cannot be found in the in-between lines, you could do that quite easily by:

reading your file fully (doesn't work that well if your file has 20Tb of data in it but ok for reasonably-sized files)
applying str.split on DELIM
splitting each block and filtering blanks (artifacts of split ) in a list comprehension

My proposal:

with open("file.txt") as f:
    lines = [x.split() for x in f.read().split("DELIM") if x]

print(lines)

result with your input (as a list of lists of lines):

[['filename1', 'information', 'information'], ['filename2', 'information', 'information', 'information', 'information']]

Edit: with a big file, you could use itertools.groupy as follows (avoids reading the file at once)

with open("file.txt") as f:
    lines = [list(v) for k,v in itertools.groupby(f,key=lambda x : x.strip()=="DELIM") if not k]

groupby groups the non-delim lines together and the delim lines together as well, with a True/False key: we filter out the True key with corresponds to DELIM groups and convert to list , to reach the same value as in the previous code, only we don't need to read the file beforehand, so it would work with a huge file as well.

Extracting lines of a text file between delimiters into a list Python

Question

1 answers

solution1
2 2017-02-12 21:33:15

Extracting lines of a text file between delimiters into a list Python

Question

1 answers

solution1 2 2017-02-12 21:33:15

solution1
2 2017-02-12 21:33:15