简体   繁体   中英

Reading file with custom delimiter

I am trying to parse a file with some custom input with delimiter in between lines. Is there an efficient way to parse file.

Input:

    ABCD
    XYZ
    %
    Hello
    World
    %%
    XXX
    YYY
    ZZZ

Expected output: ['ABCDXYZ','HelloWorld','XXXYYYZZZ']

My code is only getting me list of all words: ['ABC','XYZ','Hello','World','XXX','YYY','ZZZ'] Code:

op = []
with open('random_input','r') as fh:
    for line in fh:
        if line.rstrip()=='%':
            continue
        else:
            op.append(line.rstrip())
print(op)

Is there a way to get the expected output: ['ABCDXYZ','HelloWorld','XXXYYYZZZ']

First you need to split the input on one or more % characters and then remove white space from each part:

import re

text = """ABCD
    XYZ
    %
    Hello
    World
    %%
    XXX
    YYY
    ZZZ"""

parts = [re.sub(r'\s+', '', part) for part in re.split(r'%+', text)]
print(parts)

Prints:

['ABCDXYZ', 'HelloWorld', 'XXXYYYZZZ']

So, first read the entire file into variable text and process as above.

If there is an absolute need to ensure that the % characters are on a line by themselves, then use:

parts = [re.sub(r'\s+', '', part) for part in re.split(r'^\s*%+\s*$', text, flags=re.M)]

Note that the above removes all white space, including whatever white space there might be between words on a line, because the way your question was posted it appeared that you had leading white space in the input. If your intention was to just join lines, then use the following:

parts = [part.replace('\n', '') for part in re.split(r'(?:^\s*%+\s*\n)+', text, flags=re.M)]
op = []
with open('random_input','r') as fh:
    for line in fh:
        if line.rstrip()!='%':
            op.append(line.rstrip())
            
print(op)

Try this way

Combine the strings before you store them in the list:

op = []
string = '' # to store the string
with open('random_input','r') as fh:
    for line in fh:
        if line.rstrip().startswith('%'):
            op.append(string)
            string = ''
            continue
        else:
            string = string + line.rstrip()
            
print(op)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM