查找正则表达式匹配的字符串的长度

Question

我正在尝试编写一个脚本来解析由编译器/链接器生成的映射文件，如下所示：

%SEGMENT_SECTION
                                                      Start Address  End Address
--------------------------------------------------------------------------------
Segment Name: S1_1, Segment Type: .bss                0A000000       0A050F23
--------------------------------------------------------------------------------
area1_start.o (.bss)                                  0A000000       0A000003
...

                                                      Start Address  End Address
--------------------------------------------------------------------------------
Segment Name: S2_1, Segment Type: .bss                0A050F24       0A060000
--------------------------------------------------------------------------------
area2_start.o (.bss)                                  0A000000       0A000003

...

%NEXT_SECTION

我目前正在编写几个正则表达式（python的re模块）来对此进行解析，但是我想以一种非常易于阅读的方式来编写它们，以使得其解析非常简单。 实质上：

with open('blah.map') as f:
    text = f.read()

# ... Parse the file to update text to be after the %SEGMENT_SECTION

match = segment_header_re.match(text)
seg_name, seg_type, start_addr, end_addr = match.groups()
# ... (Do more with matched values)

text = text[len(match.matched_str):]

# Parse the remainder of text

但是，我不知道如何获取匹配字符串的长度，就像我的match.matched_str伪代码中那样。 我在python的re文档中看不到任何东西。 有没有更好的方法来进行这种类型的解析？

Answer 1

对于您要实现的目标，请使用match.span方法。

>>> 
>>> s = 'The quick brown fox jumps over the lazy dog'
>>> m = re.search('brown', s)
>>> m.span()
(10, 15)
>>> start, end = m.span()
>>> s[end:]
' fox jumps over the lazy dog'
>>>

或者只是match.end方法。

>>> s[m.end():]
' fox jumps over the lazy dog'
>>>

另一种选择是使用可以使用pos和endpos参数的正则表达式对象，以将搜索限制到字符串的一部分。

>>> s = 'The quick brown fox jumps over the lazy dog'
>>> over = re.compile('over')
>>> brown = re.compile('brown')
>>> m_brown = brown.search(s)
>>> m_brown.span()
(10, 15)
>>> m_over = over.search(s)
>>> m_over.span()
(26, 30)

开始寻求over在比赛的结束brown 。

>>> match = over.search(s, pos = m_brown.end())
>>> match.group()
'over'
>>> match.span()
(26, 30)

在比赛结束时搜索over brown不会产生匹配。

>>> match = brown.search(s, m_over.end())
>>> match.group()

Traceback (most recent call last):
  File "<pyshell#71>", line 1, in <module>
    match.group()
AttributeError: 'NoneType' object has no attribute 'group'
>>> print(match)
None
>>>

对于长字符串和多次搜索，使用带有起始位置参数的正则表达式对象肯定会加快处理速度。

Answer 2

您可以使用.group()方法。 整个匹配的字符串可以通过match.group(0)检索：

text = text[len(match.group(0)):]

演示：

>>> import re
>>> re.match('(a)bc(d)', 'abcde').group(0)  # 'e' is excluded since it wasn't matched
'abcd'
>>>
>>> # You can also get individual capture groups by number (starting at 1)
>>> re.match('(a)bc(d)', 'abcde').group(1)
'a'
>>> re.match('(a)bc(d)', 'abcde').group(2)
'd'
>>>

但是请注意，如果没有匹配项，这将引发AttributeError ：

>>> re.match('xyz', 'abcde').group(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>>

您可能希望在调用match对象上的方法之前执行检查，以确保匹配成功。

查找正则表达式匹配的字符串的长度

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-02-03 17:11:45

解决方案2
1 2015-02-03 16:57:18

查找正则表达式匹配的字符串的长度

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-02-03 17:11:45

解决方案2 1 2015-02-03 16:57:18

解决方案1
3 已采纳 2015-02-03 17:11:45

解决方案2
1 2015-02-03 16:57:18