[英]Regex: Start of string does not behave as expected
I am parsing (by regex) a .map file generated by a linker for ARM. 我正在解析(通过正则表达式)由ARM链接程序生成的.map文件。 I have extracted pretty much everything but this section is resisting. 我已经提取了几乎所有内容,但是本节内容有所抵触。
Here is an excerpt of the part I want to parse 这是我要解析的部分的摘录
COMMON 0x20002b18 0x1 ./2_Programa/source/board.o
0x20002b18 BOARD_ctx
COMMON 0x20002b19 0x87 ./2_Programa/source/interface_objects.o
0x20002b19 GLB_appIntObjPropChangeFlags
0x20002b1a GLB_aioBLCommand
0x20002b65 GLB_aioDateTime
COMMON 0x20002ba0 0x31 ./2_Programa/source/objects.o
0x20002ba0 GLB_goFlags
*fill* 0x20002bd1 0x3
and this is my best regex attempt: 这是我最好的正则表达式尝试:
^ COMMON\s+(0x\S+)\s+(0x\S+).*(?:\s+(0x\S+)\s+(\S+)[\r\n])*(?:\s+\*fill\*\s+0x\S+\s+(0x\S+))?
The result can be checked here . 可在此处检查结果 。 The result I get, only matches the last line of the block (I consider a block when it starts with COMMON
). 我得到的结果仅与该块的最后一行匹配(当它以COMMON
开头时,我认为是一个块)。
What I need to extract is something similar to this: 我需要提取的内容与此类似:
[{
'name': 'GLB_appIntObjPropChangeFlags',
'size': 0x01,
'path': './2_Programa/source/interface_objects.o',
'origin': 0x20002b19
},
{
'name': 'GLB_aioBLCommand',
'size': 0x87,
'path': './2_Programa/source/interface_objects.o',
'origin': 0x20002b1a
},
...
]
My main problem here is that I am not able to separate the first line 我的主要问题是我无法分隔第一行
COMMON 0x20002b19 0x87 ./2_Programa/source/interface_objects.o`
from the others related to it 与之相关的其他人
0x20002b19 GLB_appIntObjPropChangeFlags
0x20002b1a GLB_aioBLCommand
0x20002b65 GLB_aioDateTime
Could anyone give some hints to face this off? 任何人都可以给一些提示以解决这个问题吗?
UPDATE 更新
What I would like to do is to split all blocks (those that start with COMMON
) into two parts. 我想做的是将所有块(以COMMON
开头的块)分成两部分。 Group 1: 第一组:
COMMON 0x20002b19 0x87 ./2_Programa/source/interface_objects.o`
and Group2: 和第2组:
0x20002b19 GLB_appIntObjPropChangeFlags
0x20002b1a GLB_aioBLCommand
0x20002b65 GLB_aioDateTime
Then, I could regex each group separately: 然后,我可以分别对每个组进行正则表达式:
Regex for Group 1: 第1组的正则表达式:
^ COMMON\s+(0x\S+)\s+(\S+)\s+(\S+)
and this other for Group 2 (setting multi line flag): 第二组(设置多行标志):
^\s+(0x\S+)\s+(\S+)
As a result I will get three groups from first regex and other six (2 per line per 3 lines) which could easily converted in a list of dict
s as I showed above. 结果,我将从第一个正则表达式中获得三组,而其他六组(每行每三行2个)可以很容易地转换成如上所示的dict
列表。
Realistically, you should grab each COMMON
block as mentioned in the comments under your question by Wiktor Stribiżew . 实际上,您应该抓住WiktorStribiżew在您的问题下的注释中提到的每个COMMON
块。 Link to Wiktor's regex here . 在此处链接到Wiktor的正则表达式。 Regex does not have the ability to loop over a subquery (that's not its purpose). 正则表达式没有能力遍历子查询(这不是其目的)。
Impractically, you can use this regex to grab each COMMON
section and its following blocks, and then map it. 不切实际地,您可以使用此正则表达式获取每个COMMON
节及其后续块,然后进行映射。
See regex in use here 查看正则表达式在这里使用
(?:COMMON\s+0x[0-9a-f]+\s+(0x[0-9a-f]+)\s+(\S+)|\s*(0x[0-9a-f]+)\s+(\S+))(?=\s*[\r\n])
COMMON\\s+0x[0-9a-f]+\\s+(0x[0-9a-f]+)\\s+(\\S+)
Option 1 COMMON\\s+0x[0-9a-f]+\\s+(0x[0-9a-f]+)\\s+(\\S+)
选项1
COMMON\\s+0x[0-9a-f]+\\s+
COMMON
The characters COMMON
literally COMMON
的字符COMMON
字面上 \\s+
One or more whitespace characters \\s+
一个或多个空格字符 0x
These characters 0x
literally 0x
这些字符从字面上看是0x
[0-9a-f]+
One or more of the characters in the set 0-9a-f
[0-9a-f]+
的一个或集合中的多个字符的0-9a-f
\\s+
One or more whitespace characters \\s+
\\s+
一个或多个空格字符\\s+
(0x[0-9a-f]+)
Capture the following into capture group 1 (0x[0-9a-f]+)
将以下内容捕获到捕获组1中
0x
These characters 0x
literally 0x
这些字符从字面上看是0x
[0-9a-f]+
One or more of the characters in the set 0-9a-f
[0-9a-f]+
的一个或集合中的多个字符的0-9a-f
\\s+
One or more whitespace characters \\s+
一个或多个空格字符 (\\S+)
Capture one or more non-whitespace characters into capture group 2 (\\S+)
一个或多个非空白字符捕获到捕获组2中 \\s*(0x[0-9a-f]+)\\s+(\\S+)
Option 2 \\s*(0x[0-9a-f]+)\\s+(\\S+)
选项2
\\s*
Any number of whitespace characters \\s*
任意数量的空格字符 (0x[0-9a-f]+)
Capture the following into capture group 3 (0x[0-9a-f]+)
将以下内容捕获到捕获组3中
0x
These characters 0x
literally 0x
这些字符从字面上看是0x
[0-9a-f]+
One or more of the characters in the set 0-9a-f
[0-9a-f]+
的一个或集合中的多个字符的0-9a-f
\\s+
One or more whitespace characters \\s+
一个或多个空格字符 (\\S+)
Capture one or more non-whitespace characters into capture group 4 (\\S+)
一个或多个非空白字符捕获到捕获组4中 (?=\\s*[\\r\\n])
Ensure what follows is any number of whitespace characters, followed by a newline character \\r\\n
(?=\\s*[\\r\\n])
确保紧随其后的是任意数量的空格字符,后跟换行符\\r\\n
Based on the order of the matches and the groups to which they belong, you can map them to an array as you've presented. 根据匹配的顺序及其所属的组,您可以将它们映射为所呈现的数组。
For example (in match order). 例如(按比赛顺序)。
0x1
组1 0x1
./2_Programa/source/board.o
第2组 ./2_Programa/source/board.o
0x20002b18
组3 0x20002b18
BOARD_ctx
第4组BOARD_ctx
0x87
组1 0x87
./2_Programa/source/interface_objects.o
组2 ./2_Programa/source/interface_objects.o
0x20002b19
组3 0x20002b19
GLB_appIntObjPropChangeFlags
第4组GLB_appIntObjPropChangeFlags
0x20002b1a
组3 0x20002b1a
GLB_aioBLCommand
第4组GLB_aioBLCommand
0x20002b65
组3 0x20002b65
GLB_aioDateTime
第4组GLB_aioDateTime
Always associating the last match for group 1 and group 2 to the current match for group 3 and group 4 始终将组1和组2的最后一场比赛与组3和组4的当前比赛相关联
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.