[英]Find length of string matched by regex
I am trying to write a script to parse a map file generated by a compiler/linker, that looks like this: 我正在尝试编写一个脚本来解析由编译器/链接器生成的映射文件,如下所示:
%SEGMENT_SECTION
Start Address End Address
--------------------------------------------------------------------------------
Segment Name: S1_1, Segment Type: .bss 0A000000 0A050F23
--------------------------------------------------------------------------------
area1_start.o (.bss) 0A000000 0A000003
...
Start Address End Address
--------------------------------------------------------------------------------
Segment Name: S2_1, Segment Type: .bss 0A050F24 0A060000
--------------------------------------------------------------------------------
area2_start.o (.bss) 0A000000 0A000003
...
%NEXT_SECTION
I am currently writing several regular expressions (python's re module) to parse this, but I want to write them in a very easy-to-read way, such that it's very simple to parse. 我目前正在编写几个正则表达式(python的re模块)来对此进行解析,但是我想以一种非常易于阅读的方式来编写它们,以使得其解析非常简单。 Essentially: 实质上:
with open('blah.map') as f:
text = f.read()
# ... Parse the file to update text to be after the %SEGMENT_SECTION
match = segment_header_re.match(text)
seg_name, seg_type, start_addr, end_addr = match.groups()
# ... (Do more with matched values)
text = text[len(match.matched_str):]
# Parse the remainder of text
However, I don't know how to get the length of the matched string, as in my match.matched_str
pseudo code. 但是,我不知道如何获取匹配字符串的长度,就像我的match.matched_str
伪代码中那样。 I don't see anything in python's documentation of re. 我在python的re文档中看不到任何东西。 Is there a better way to do this type of parsing? 有没有更好的方法来进行这种类型的解析?
For what you are trying to achieve, use the match.span
method. 对于您要实现的目标,请使用match.span
方法。
>>>
>>> s = 'The quick brown fox jumps over the lazy dog'
>>> m = re.search('brown', s)
>>> m.span()
(10, 15)
>>> start, end = m.span()
>>> s[end:]
' fox jumps over the lazy dog'
>>>
Or just the match.end
method. 或者只是match.end
方法。
>>> s[m.end():]
' fox jumps over the lazy dog'
>>>
Another option is to use regular expression objects which can take pos
and endpos
arguments to limit the search to a portion of the string. 另一种选择是使用可以使用pos
和endpos
参数的正则表达式对象 ,以将搜索限制到字符串的一部分。
>>> s = 'The quick brown fox jumps over the lazy dog'
>>> over = re.compile('over')
>>> brown = re.compile('brown')
>>> m_brown = brown.search(s)
>>> m_brown.span()
(10, 15)
>>> m_over = over.search(s)
>>> m_over.span()
(26, 30)
Begin the search for over
at the end of the match for brown
. 开始寻求over
在比赛的结束brown
。
>>> match = over.search(s, pos = m_brown.end())
>>> match.group()
'over'
>>> match.span()
(26, 30)
Searching for brown
starting at the end of the match for over
, will not produce a match. 在比赛结束时搜索over
brown
不会产生匹配。
>>> match = brown.search(s, m_over.end())
>>> match.group()
Traceback (most recent call last):
File "<pyshell#71>", line 1, in <module>
match.group()
AttributeError: 'NoneType' object has no attribute 'group'
>>> print(match)
None
>>>
For long strings and multiple searches, using a regular expression object with a start position argument will definitely speed things up. 对于长字符串和多次搜索,使用带有起始位置参数的正则表达式对象肯定会加快处理速度。
You can use the .group()
method. 您可以使用.group()
方法。 The entire matched string can be retrieved by match.group(0)
: 整个匹配的字符串可以通过match.group(0)
检索:
text = text[len(match.group(0)):]
Demo: 演示:
>>> import re
>>> re.match('(a)bc(d)', 'abcde').group(0) # 'e' is excluded since it wasn't matched
'abcd'
>>>
>>> # You can also get individual capture groups by number (starting at 1)
>>> re.match('(a)bc(d)', 'abcde').group(1)
'a'
>>> re.match('(a)bc(d)', 'abcde').group(2)
'd'
>>>
Note however that this will raise an AttributeError
if there was no match: 但是请注意,如果没有匹配项,这将引发AttributeError
:
>>> re.match('xyz', 'abcde').group(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>>
You may wish to implement a check that makes sure the match was successful before you go calling methods on the match object. 您可能希望在调用match对象上的方法之前执行检查,以确保匹配成功。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.