简体   繁体   中英

Trying to skip lines that match regex when writing to file, but new file has extra new lines

This is an spin-off on a question I asked here .

I'm trying to setup a method that can edit text files, based on the input dictionary. This is what I have so far:

info = {'#check here 1':{'action':'read'}, '#check here 2':{'action':'delete'}}

search_pattern = re.compile(r'.*(#.+)')        

    with open(input_file_name, "r") as old_file, open(output_file_name, "w+") as new_file:
        lines = old_file.readlines()

        for line in lines:
            edit_point = search_pattern.search(line)
            if edit_point:
                result = edit_point.group(1)
                if result in info and info[result]["action"] == "insert":#insert new lines to file
                    print("insert information to file")
                    new_file.write("\n".join([str(n) for n in info[result]["new_lines"]]))
                    new_file.write(result)
                elif result in info and info[result]["action"] == "delete":#skip lines with delete action
                    print("found deletion point. skipping line")
                else:#write to file any line with a comment that is not in info
                    new_file.write(line)
            else:#write lines that do not match regex for (#.*)
                new_file.write(line)

Basically, when you submit the dictionary, the program will iterate through the file, searching for comments. If the comment is in the dictionary, it will check the corresponding action. If the action is to insert, it will write the lines to the file. If it is delete, it will skip that line. Any line that does not have a comment should be written to the new file.

My problem is that when I delete a line from the file, it appears that there is extra new lines where they used to be. For example, if I have a list:

hello world

how are you #keep this
I'm fine #check here 2
whats up

I expect the output to be:

hello world

how are you #keep this
whats up

But I instead have a blank line there:

hello world

how are you #check here 2

whats up

I suspect that it is my final else statement, which write to the file any line that does not match edit_point, in this case new lines. However, my understanding is that the for loop should go line by line, and simply go that line. Can anyone tell me what I'm missing here?

That looks a little tangled, you're mixing the reading and writing logic with the processing logic which makes it difficult to keep track of what's going on. Try this approach instead:

from enum import Enum
from typing import Dict, List


class Action(Enum):
    KEEP = "keep"
    REMOVE = "remove"


definition = {
    "#KEEP": {"action": Action.KEEP},
    "#REMOVE": {"action": Action.REMOVE},
}


def clean_comments(
    lines: List[str], definition: Dict[str, Dict[str, str]]
) -> List[str]:

    # Keep a list of the lines that should be in the output
    output: List[str] = []

    # Loop the lines
    for line in lines:

        # If any of the comments in the definition is found, process further
        if any([comment in line for comment in definition.keys()]):

            # Figure out what to do
            for comment, details in definition.items():
                if comment in line:

                    if details["action"] == Action.KEEP:
                        output.append(line)
                        break

                    elif details["action"] == Action.REMOVE:
                        break

        # Keep all other lines
        else:
            output.append(line)

    return output


# Your data here...
with open("test_input.txt", "r") as f:
    lines = f.readlines()

# Use the function to clean the text
clean_text = "".join(clean_comments(lines, definition))

# Show the output
print(clean_text)

# Write to file
with open("test.txt", "w") as f:
    f.write(clean_text)

Output:

hello world

how are you #KEEP: This line will be kept in the output file
whats up

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM