简体   繁体   中英

Extracting lines of a text file between delimiters into a list Python

I have a large text file with lines in this format:

DELIM
filename1
information
information
DELIM
filename2
information
information
information
information
DELIM

and so on, where the amount of data in between the delimiters varies. How do I go about writing everything between the delimiters as a list?

Provided that DELIM cannot be found in the in-between lines, you could do that quite easily by:

  • reading your file fully (doesn't work that well if your file has 20Tb of data in it but ok for reasonably-sized files)
  • applying str.split on DELIM
  • splitting each block and filtering blanks (artifacts of split ) in a list comprehension

My proposal:

with open("file.txt") as f:
    lines = [x.split() for x in f.read().split("DELIM") if x]

print(lines)

result with your input (as a list of lists of lines):

[['filename1', 'information', 'information'], ['filename2', 'information', 'information', 'information', 'information']]

Edit: with a big file, you could use itertools.groupy as follows (avoids reading the file at once)

with open("file.txt") as f:
    lines = [list(v) for k,v in itertools.groupby(f,key=lambda x : x.strip()=="DELIM") if not k]

groupby groups the non-delim lines together and the delim lines together as well, with a True/False key: we filter out the True key with corresponds to DELIM groups and convert to list , to reach the same value as in the previous code, only we don't need to read the file beforehand, so it would work with a huge file as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM