正则表达式在文本开始之前匹配空白字符和多个可选模式

Question

我正在解析包含如下代码的字符串。 它可以以空行开头，后跟多个可选模式。 这些模式可以是 python 样式的内联注释（使用 hash # 字符），也可以是命令“,mycommand”。 两者都必须从一行的开头开始？ 如何编写与代码开头匹配的正则表达式？

mystring = """

# catch this comment
!mycommand
# catch this comment
#catch this comment too
!mycommand

# catch this comment
!mycommand
!mycommand

some code. match until the previous line
# do not catch this comment
!mycommand
# do not catch this comment
"""

import re
pattern = r'^\s*^#.*|!mycommand\s*'
m = re.search(pattern, mystring, re.MULTILINE)
mystring[m.start():m.end()]

mystring = 'code. do not match anything' + mystring
m = re.search(pattern, mystring, re.MULTILINE)

我希望正则表达式将字符串匹配到“一些代码。直到上一行”。 我尝试了不同的东西，但我可能被这两种多重模式所困扰

Answer 1

无需re.MULTILINE您可以在匹配前后重复匹配 0+ 个空格字符

^(?:\s*(?:#.*|!mycommand\s*))+\s*

正则表达式演示| Python 演示

例如

import re
m = re.search(r'^(?:\s*(?:#.*|!mycommand\s*))+\s*', mystring)
print(m.group())

Answer 2

您的模式匹配#...或!mycommand的一个实例。 解决此问题的一种方法是将它们全部放入一个匹配项中，并使用re.search找到第一个匹配项。

为此，您需要使用*重复匹配#...或!mycommand的部分：

^\s*^(?:#.*\s*|!mycommand\s*)*

我还将#.*更改为#.*\s*以便它一直到找到非空格的下一行。

演示

回复您的评论：

如果字符串以代码开头，则此正则表达式不应匹配任何内容

你可以试试：

\A\s*^(?:#.*\s*|!mycommand\s*)+

我更改为\A以便它只匹配字符串的绝对开头，而不是行首。 我还将最后一个*更改为+ ，因此至少有一个#...或!mycommand必须存在。

Answer 3

匹配并返回字符串开头的注释

不需要正则表达式，读取和 append 列出的行，直到一行不以! 或#出现并忽略所有空行：

mystring = "YOUR_STRING_HERE"

results = []
for line in mystring.splitlines():
  if not line.strip():                                      # Skip blank lines
    continue
  if not line.startswith('#') and not line.startswith('!'): # Reject if does not start with ! or #
    break
  else:
    results.append(line)                                    # Append comment

print(results)

请参阅Python 演示。 结果：

['# catch this comment', '!mycommand', '# catch this comment', '#catch this comment too', '!mycommand', '# catch this comment', '!mycommand', '!mycommand']

删除字符串开头的注释

results = []
flag = False
for line in mystring.splitlines():
  if not flag and not line.strip():
    continue
  if not flag and not line.startswith('#') and not line.startswith('!'):
    flag = True
  if flag:
    results.append(line)

print("\n".join(results))

Output：

some code. match until the previous line
# do not catch this comment
!mycommand
# do not catch this comment

请参阅此 Python 演示。

正则表达式方法

import re
print(re.sub(r'^(?:(?:[!#].*)?\n)+', '', mystring))

如果在行首有可选的缩进空格，请添加[^\S\n]* ：

print(re.sub(r'^(?:[^\S\n]*(?:[!#].*)?\n)+', '', mystring, count=1))

请参阅正则表达式演示和Python 演示。 count=1将确保我们只删除第一个匹配项（您无需检查所有其他行）。

正则表达式详细信息

^ - 字符串的开头
(?:[^\S\n]*(?:[.#]?*)?\n)+ - 1 次或多次
- [^\S\n]* - 可选的水平空格
- (?:[.#]?*)? - 一个可选的序列
  - [!#] - ! 或#
  - .* - 线的rest
- \n - 换行符。

正则表达式在文本开始之前匹配空白字符和多个可选模式

问题描述

3 个解决方案

解决方案1
2 2020-07-02 13:56:48

解决方案2
1 已采纳 2020-07-02 13:27:00

解决方案3
1 2020-07-02 13:29:51

正则表达式在文本开始之前匹配空白字符和多个可选模式

问题描述

3 个解决方案

解决方案1 2 2020-07-02 13:56:48

解决方案2 1 已采纳 2020-07-02 13:27:00

解决方案3 1 2020-07-02 13:29:51

解决方案1
2 2020-07-02 13:56:48

解决方案2
1 已采纳 2020-07-02 13:27:00

解决方案3
1 2020-07-02 13:29:51