如何从匹配字符串中删除多行文本？

Question

I have the following snippet I would like to completely remove with a regex (or some other method).我有以下片段，我想用正则表达式（或其他一些方法）完全删除。

# ---------------------------------------------------------------------------------------------------------------------
# MODULE PARAMETERS
# These are the variables we have to pass in to use the module specified in the terragrunt configuration above
# ---------------------------------------------------------------------------------------------------------------------

Is there syntax to tell the match to get rid of everything between the two matches?是否有语法告诉比赛摆脱两场比赛之间的一切？

It seems like it should be easy to do, but for some reason I have only been able to find a regex that pulls the first and last match out via this code.看起来应该很容易做到，但出于某种原因，我只能找到一个正则表达式，通过这段代码提取第一个和最后一个匹配项。 I have tried a number of permutations of this regex but can't get it working.我已经尝试了这个正则表达式的许多排列，但无法让它工作。

...
re.sub(r'# --*[\S\s]*---', '', lines[line])
...

This regex tool says that my regex should work.这个正则表达式工具说我的正则表达式应该可以工作。

EDIT:编辑：

The text I am interested in matching is being read in one line at a time.我有兴趣匹配的文本一次一行被阅读。

...
for the_file in files_to_update:
    with open(the_file + "/the_file", "r") as in_file:
        lines = in_file.readlines()

And subsequently being iterated over.并随后被迭代。 The snippet above is acutally happening in this loop.上面的代码片段实际上发生在这个循环中。

for line in range(len(lines)):

Answer 1

You should read in the file into a single variable to be able to run a regex on it, that can match more than one line of text.您应该将文件读入单个变量，以便能够在其上运行正则表达式，该正则表达式可以匹配多行文本。

You may use你可以使用

with open(filepath, 'r') as fr:
  with open(filesavepath, 'w') as fw:
    fw.write( re.sub(r'^# -+(?:\n# .*)*\n# -+$\n?', '', fr.read(), flags=re.M) )

See the Python demo and a regex demo .请参阅Python 演示和正则表达式演示。

Here, fr is the handle of the file you read from, and fw is the handle of the file you are writing to.在这里， fr是您读取的文件的句柄， fw是您要写入的文件的句柄。 The input for re.sub is fr.read() , this method grabs the whole file contents and passes to the regex engine. re.sub的输入是fr.read() ，此方法获取整个文件内容并传递给正则表达式引擎。

The regex means:正则表达式意味着：

^ - start of a line (due to re.M ) ^ - 一行的开头（由于re.M ）
# -+ - a # , space and then one or more hyphens # -+ - 一个# ，空格，然后是一个或多个连字符
(?:\n#.*)* - 0 or more repetitions of a newline, # , space, any text up to the end of a line (?:\n#.*)* - 换行符、 # 、空格、直到行尾的任何文本重复 0 次或多次
\n - a newline \n - 一个换行符
# -+$ - # , space, one or more hyphens and end of a line # -+$ - # 、空格、一个或多个连字符和一行结尾
\n? - an optional newline. - 一个可选的换行符。

A non-regex way of removing the comments is by reading line by line, checking if a line starts with # --- and set a flag that would enable checking if we are inside the comment or not:一种非正则表达式的删除注释的方法是逐行阅读，检查一行是否以# ---开头，并设置一个标志来检查我们是否在注释中：

for line in fr:
    if line.startswith('# ---'):
        flag = not flag
        continue
    if flag:
        lines.append(line)
        
print("\n".join(lines))

See this Python demo .请参阅此 Python 演示。

Answer 2

Why not just use string functions with a small function?为什么不使用带有小 function 的字符串函数？

data = """
# ---------------------------------------------------------------------------------------------------------------------
# MODULE PARAMETERS
# These are the variables we have to pass in to use the module specified in the terragrunt configuration above
# ---------------------------------------------------------------------------------------------------------------------

foo
# some other comment
bar

"""

def remover(block):
    remove = False

    for line in block.split("\n"):
        if line.startswith("# ---"):
            remove = not remove
        elif not remove:
            yield line

cleaned = [line for line in remover(data)]
print(cleaned)

This yields这产生

['', '', 'foo', '# some other comment', 'bar', '', '']

如何从匹配字符串中删除多行文本？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-07-29 14:03:41

解决方案2
2 2020-07-29 14:07:31

如何从匹配字符串中删除多行文本？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-07-29 14:03:41

解决方案2 2 2020-07-29 14:07:31

解决方案1
2 已采纳 2020-07-29 14:03:41

解决方案2
2 2020-07-29 14:07:31