使用正则表达式从文本文件中删除行

Question

我有一个包含以下行的文本文件：

Fruit=Apple
Id=#1001
Weight=7
Color=Red
...
Fruit=Watermelon
Id=#1002
Weight=20
Color=Green
...
Fruit=Cherry
Id=#1003
...

给定要删除的水果的ID，我试图删除与该水果有关的所有行。 所以，我读了#1002 ，我想从Fruit=Watermelon删除所有行，直到（但不包括） Fruit=Cherry 。 我不知道每个水果会有多少行信息，而且它们会有所不同。

我已经尝试通过以下逻辑使用正则表达式：

repl_string = "Fruit=(.*?)\nId=" + user_inputted_id_to_match + "\n(.*)(?=\nFruit=)"
re.sub(repl_string, "\n", text_file_as_string)

基本上，我要使Fruit行， Id行与用户给我的内容匹配，然后将其他所有内容匹配到下一个Fruit行的前瞻性。 那有意义吗？

我运行了它，结果文本文件只删除了Id的值：

Fruit=Apple
Id=#1001
Weight=7
Color=Red
...
Fruit=Watermelon
Id=
Weight=20
Color=Green
...
Fruit=Cherry
Id=#1003
...

如何删除与给定水果对应的所有行？

Answer 1

我建议比正则表达式更简单的策略。 试试下面的伪代码：

user_inputted_id = get_user_inputted_id()

with open(fruitfile) as file:
    while file: # While there is still more in the file
        read in "Fruit=..." line
        read in "Id#=..." line
        if id is not the user specified one:
            keep_data = True
            add fruit and id lines into result list/string
        while next line is not a "Fruit=..." line:
            if keep_data:
                add line to result

当然，与使用regex相比，最终得到的代码更多，但这也可以使您进行设置，以便您可以轻松地分析文件并将结果存储在数据结构中。 如果您只想将每个水果存储为字典，则可以执行以下操作：

parsed_fruit = []
next_fruit = {}
with open(fruitfile) as file:
    while file:
        next_line = file.readline()
        if 'Fruit=' in next_line and next_fruit: # Makes sure that we don't add the initial empty dictionary
            parsed_fruit.append(next_fruit)
            next_fruit = {}
        next_line_data = next_line.split('=')
        fruit[next_line_data[0]] = next_line_data[1]
    parsed_fruit.append(next_fruit) # Add last fruit in file

然后，只需遍历列表并删除具有要删除的ID的所有水果即可。

Answer 2

更新＃2：添加了贪婪的量词（？）

这是原始正则表达式：

(?s)Fruit=[^\n]*\nId=#1002.*?(?=Fruit)

将您的更改为：

repl_string = "(?s)Fruit=[^\n]*\nId=" + user_inputted_id_to_match + ".*?(?=Fruit)"

现场演示

使用正则表达式从文本文件中删除行

问题描述

2 个解决方案

解决方案1
1 2014-04-21 19:18:43

解决方案2
1 已采纳 2014-04-21 19:19:32

使用正则表达式从文本文件中删除行

问题描述

2 个解决方案

解决方案1 1 2014-04-21 19:18:43

解决方案2 1 已采纳 2014-04-21 19:19:32

解决方案1
1 2014-04-21 19:18:43

解决方案2
1 已采纳 2014-04-21 19:19:32