RegEx用于捕获字符串的一部分

Question

I am trying to grab top level Markdown headings (ie, headings beginning with a single hash -- # Introduction) in an .md doc with Python's re library and cannot for the life of me figure this out. 我试图在一个带有Python的库的.md文档中获取顶级Markdown标题（即标题以单个哈希开头 - #Introduction），并且在我的生活中无法想到这一点。

Here is the code I'm trying to execute: 这是我正在尝试执行的代码：

import re

pattern = r"(# .+?\\n)"

text = r"# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"

header = re.search(pattern, text)
print(header.string)

The result from the print(header.string) is: print(header.string)的结果是：

# Title\\n## Chapter\\n### sub-chapter#### What a lovely day.\\n whereas I only want # Title\\n # Title\\n## Chapter\\n### sub-chapter#### What a lovely day.\\n而我只想要# Title\\n

This example on regex101 says it should work, but I can't figure out why it isn't. regex101上的这个例子说它应该可以工作，但我无法弄清楚它为什么不行。 https://regex101.com/r/u4ZIE0/9 https://regex101.com/r/u4ZIE0/9

Answer 1

You get that result because you use header.string which is calling .string on a Match object which will give you back the string passed to match() or search () . 你得到那个结果是因为你使用了header.string ，它在一个Match对象上调用.string ，它会返回传递给match()或search ()的字符串。

The string already has newlines in it: 字符串中已经有换行符：

text = r"# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"

So if you use your pattern (note that it will also match the newline), you could update your code to: 因此，如果您使用您的模式（请注意它也将与换行符匹配），您可以将代码更新为：

import re

pattern = r"(# .+?\\n)"
text = r"# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"
header = re.search(pattern, text)
print(header.group())

Python demo Python演示

Note that re.search looks for the first location where the regex produces a match. 请注意， re.search会查找正则表达式生成匹配项的第一个位置。

Another option to match your value could be matching from the start of the string a # followed by a space and then any character except a newline until the end of the string: 以符合你的价值的另一个选项是从字符串的开头来匹配#后跟换行符以外，直到字符串末尾一个空格，然后任意字符：

^# .*$

For example: 例如：

import re

pattern = r"^# .*$"
text = "# Title\n## Chapter\n### sub-chapter#### What a lovely day.\n"
header = re.search(pattern, text, re.M)
print(header.group())

Python demo Python演示

If there can not be any more # following after, you might also use a negated character class to match not a # or a newline: 如果之后不再有# ，那么您也可以使用否定的字符类来匹配#或换行符：

^# [^#\n\r]+$

Answer 2

I'm guessing that we are wishing to extract the # Title\\n , which in that case, your expression seems to be working fine with a slight modification: 我猜我们希望提取# Title\\n ，在这种情况下，你的表达式似乎工作正常，略有修改：

(# .+?\\n)(.+)

DEMO DEMO

Test 测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(# .+?\\n)(.+)"

test_str = "# Title\\n## Chapter\\n### sub-chapter#### The Bar\\nIt was a fall day.\\n"

subst = "\\1"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 1)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx用于捕获字符串的一部分

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-05-26 13:07:04

解决方案2
1 2019-05-26 01:48:25

Test 测试

RegEx用于捕获字符串的一部分

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-05-26 13:07:04

解决方案2 1 2019-05-26 01:48:25

Test 测试

解决方案1
2 已采纳 2019-05-26 13:07:04

解决方案2
1 2019-05-26 01:48:25