正则表达式使用python从文件中过滤和删除特定的多行文本

Question

我正在编写一个python工具来处理一组文件。 此工具将由其他用户使用，而不是我。

文件类似于以下格式：

#Text which I want to keep intact
#Lots of text 
#Lots and lots of text 
#Lots and lots and lots of other text 

#Then in-between the file I have text in this format which I want to operate on:

ginstance 
{ 
 name ginstance_053D627B1349FA0BC57 
 node "FINDME" 
 inherit_xform on 
 visibility 255 
blah 
blah 
blah 
} 

ginstance 
{ 
 name ginstance_053D627B1349FA0BC57 
 node "DONTFINDME" 
 inherit_xform on 
 visibility 255 
blah 
blah 
blah 
}

我想做的是：

在输入文件中查找这些实例。
检查实例中的特定单词。 例如“FINDME”
如果上面的单词存在，则从文件中删除实例。 即从“ginstance”开始删除文本到波浪形括号“}”

我的工具将使用UI从用户那里获取此搜索词（“FINDME”）。

我可以找到我要删除的实例：

import re 

with open("path to input file", 'r') as input: 
    with open("path to output file", 'w') as output: 
        xfile = input.read() 
        instance = re.findall(r"ginstance.*?}", xfile, re.DOTALL) 
        for a in instance: 
            if "FINDME" in a: 
                print a

此外，此代码从输入文件中删除所有实例并将结果写入输出：

data = re.sub("ginstance.*?}", "", xfile, flags=re.DOTALL)
        output.write(data)

但我不想删除所有实例，只删除其中包含“FINDME”的实例。 如何编写包含这两个因素的单个python代码。

希望我对这个问题很清楚。 谢谢。

我已经在这个问题的堆栈溢出上搜索了很多，并在发布这个问题之前尝试了很多答案。

Answer 1

你可以采用这种方法：

ginstance\s*\{     # look for ginstance { literally
[^}]*              # anything not a }
(?:node\ "FINDME") # node "FINDME" literally
[^}]*              # anything not a }
\}                 # the closing }

它假定， ginstance的内部块中没有其他} 。
在Python这将是：

import re   
rx = re.compile("""
        ginstance\s*\{
        [^}]*
        (?:node\ "FINDME")
        [^}]*
        \}
        """, re.VERBOSE)
string = re.sub(rx, '', your_string_here)
print string

查看regex101.com以及ideone.com上的演示 。

相反：

考虑到你的评论（为了达到相反的目的），你可以采用负面的先行解决方案，如下：

ginstance\s*\{
(?:
    [^}]
    (?!(?:node\ "FINDME"))
)+
\}

在regex101.com上也可以看到这个演示。

Answer 2

试试这个

ginstance.*?{.*?node\s*"FINDME".*?}

正则表达式演示

输入

#Text which I want to keep intact
#Lots of text 
#Lots and lots of text 
#Lots and lots and lots of other text 

#Then in-between the file I have text in this format which I want to operate on:

ginstance 
{ 
 name ginstance_053D627B1349FA0BC57 
 node "FINDME" 
 inherit_xform on 
 visibility 255 
blah 
blah 
blah 
} 

ginstance 
{ 
 name ginstance_053D627B1349FA0BC57 
 node "DONTFINDME" 
 inherit_xform on 
 visibility 255 
blah 
blah 
blah 
}

输出继电器

MATCH 1
1.  [194-317]   `
ginstance 
{ 
 name ginstance_053D627B1349FA0BC57 
 node "FINDME" 
 inherit_xform on 
 visibility 255 
blah 
blah 
blah 
}`

Answer 3

难道你不认为FINDME中还存在DONTFINDME吗？ 这就是它们两者相匹配的原因。 如果它在quotes ，那么使用它

if "\"FINDME\"" in a: 
    print a

或者更好的是使用re.search() 。 它包含单词边界（ \\b ）

if re.search(r"\bFINDME\b", a, re.MULTILINE): 
    print a

正则表达式使用python从文件中过滤和删除特定的多行文本

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-04-12 10:35:37

相反：

解决方案2
1 2016-04-12 10:28:51

解决方案3
1 2016-04-12 10:34:34

正则表达式使用python从文件中过滤和删除特定的多行文本

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-04-12 10:35:37

相反：

解决方案2 1 2016-04-12 10:28:51

解决方案3 1 2016-04-12 10:34:34

解决方案1
2 已采纳 2016-04-12 10:35:37

解决方案2
1 2016-04-12 10:28:51

解决方案3
1 2016-04-12 10:34:34