I have a text file that consists of lines like this:
Fruit=Apple
Id=#1001
Weight=7
Color=Red
...
Fruit=Watermelon
Id=#1002
Weight=20
Color=Green
...
Fruit=Cherry
Id=#1003
...
I am trying to delete all the lines pertaining to a fruit given an ID of the fruit to delete. So, I read in #1002
and I want to delete all the lines from Fruit=Watermelon
all the way to (but not including) Fruit=Cherry
. I don't know how many lines of info each fruit will have and they will vary.
I have tried using regex via the following logic:
repl_string = "Fruit=(.*?)\nId=" + user_inputted_id_to_match + "\n(.*)(?=\nFruit=)"
re.sub(repl_string, "\n", text_file_as_string)
Basically, I am matching the Fruit
line, the Id
line with what the user gives me and then everything else up to a lookahead for the next Fruit
line. Does that make sense?
I ran that and the resulting text file only has Id
's value removed:
Fruit=Apple
Id=#1001
Weight=7
Color=Red
...
Fruit=Watermelon
Id=
Weight=20
Color=Green
...
Fruit=Cherry
Id=#1003
...
How do I remove all the lines corresponding to a given fruit?
I'd suggest a simpler strategy than regex. Try something like this pseudocode:
user_inputted_id = get_user_inputted_id()
with open(fruitfile) as file:
while file: # While there is still more in the file
read in "Fruit=..." line
read in "Id#=..." line
if id is not the user specified one:
keep_data = True
add fruit and id lines into result list/string
while next line is not a "Fruit=..." line:
if keep_data:
add line to result
Of course, this ends up being more code than if you use regex, but this also sets you up so that you could easily parse the file and store the fruit in datastructures. If you just want to store each fruit as a dictionary, you could do this:
parsed_fruit = []
next_fruit = {}
with open(fruitfile) as file:
while file:
next_line = file.readline()
if 'Fruit=' in next_line and next_fruit: # Makes sure that we don't add the initial empty dictionary
parsed_fruit.append(next_fruit)
next_fruit = {}
next_line_data = next_line.split('=')
fruit[next_line_data[0]] = next_line_data[1]
parsed_fruit.append(next_fruit) # Add last fruit in file
Then it's simply a matter of iterating over the list and removing any fruit that has the id you want to get rid of.
Updated #2: ungreedy quantifier added (?)
This is the raw regex:
(?s)Fruit=[^\n]*\nId=#1002.*?(?=Fruit)
Change yours to:
repl_string = "(?s)Fruit=[^\n]*\nId=" + user_inputted_id_to_match + ".*?(?=Fruit)"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.