简体   繁体   中英

Delete lines from text file using regex sub

I have a text file that consists of lines like this:

Fruit=Apple
Id=#1001
Weight=7
Color=Red
...
Fruit=Watermelon
Id=#1002
Weight=20
Color=Green
...
Fruit=Cherry
Id=#1003
...

I am trying to delete all the lines pertaining to a fruit given an ID of the fruit to delete. So, I read in #1002 and I want to delete all the lines from Fruit=Watermelon all the way to (but not including) Fruit=Cherry . I don't know how many lines of info each fruit will have and they will vary.

I have tried using regex via the following logic:

repl_string = "Fruit=(.*?)\nId=" + user_inputted_id_to_match + "\n(.*)(?=\nFruit=)"
re.sub(repl_string, "\n", text_file_as_string)

Basically, I am matching the Fruit line, the Id line with what the user gives me and then everything else up to a lookahead for the next Fruit line. Does that make sense?

I ran that and the resulting text file only has Id 's value removed:

Fruit=Apple
Id=#1001
Weight=7
Color=Red
...
Fruit=Watermelon
Id=
Weight=20
Color=Green
...
Fruit=Cherry
Id=#1003
...

How do I remove all the lines corresponding to a given fruit?

I'd suggest a simpler strategy than regex. Try something like this pseudocode:

user_inputted_id = get_user_inputted_id()

with open(fruitfile) as file:
    while file: # While there is still more in the file
        read in "Fruit=..." line
        read in "Id#=..." line
        if id is not the user specified one:
            keep_data = True
            add fruit and id lines into result list/string
        while next line is not a "Fruit=..." line:
            if keep_data:
                add line to result

Of course, this ends up being more code than if you use regex, but this also sets you up so that you could easily parse the file and store the fruit in datastructures. If you just want to store each fruit as a dictionary, you could do this:

parsed_fruit = []
next_fruit = {}
with open(fruitfile) as file:
    while file:
        next_line = file.readline()
        if 'Fruit=' in next_line and next_fruit: # Makes sure that we don't add the initial empty dictionary
            parsed_fruit.append(next_fruit)
            next_fruit = {}
        next_line_data = next_line.split('=')
        fruit[next_line_data[0]] = next_line_data[1]
    parsed_fruit.append(next_fruit) # Add last fruit in file

Then it's simply a matter of iterating over the list and removing any fruit that has the id you want to get rid of.

Updated #2: ungreedy quantifier added (?)

This is the raw regex:

(?s)Fruit=[^\n]*\nId=#1002.*?(?=Fruit)

Change yours to:

repl_string = "(?s)Fruit=[^\n]*\nId=" + user_inputted_id_to_match + ".*?(?=Fruit)"

Live demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM