简体   繁体   English

Python正则表达式查找子字符串

[英]Python regex finding sub-string

I'm New to python and regex. 我是python和regex的新手。 Here I'm trying to recover the text between two limits. 在这里,我试图恢复两个限制之间的文本。 The starting could be mov/add/rd/sub/and/etc.. and end limit is end of the line. 起始位置可以是mov / add / rd / sub / and /等。结束限制是该行的结尾。

/********** sample input text file *************/
f0004030:   a0 10 20 02     mov  %l0, %psr
//some unwanted lines
f0004034:   90 04 20 03     add  %l0, 3, %o0
f0004038:   93 48 00 00     rd  %psr, %o1
f000403c:   a0 10 3f fe     sub  %o5, %l0, %g1

/*-------- Here is the code -----------/
    try:
        objdump = open(dest+name,"r")
    except IOError:
        print "Error: '" + name + "' not found in " + dest 
        sys.exit()
    objdump_file = objdump.readlines()
    for objdump_line in objdump_file:
        a = ['add', 'mov','sub','rd', 'and']

        if any(x in objdump_line for x in a)   # To avoid unwanted lines



>>>>>>>>>> Here is the problem >>>>>>>>>>>>> 

            m = re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)

<<<<<<<<<<< Here is the problem <<<<<<<<<<<<<


           print m

/*---------- Result I'm getting --------------*/
    [('mov', '  %l0, %psr', '')]
    [('add', '  %l0, 3, %o0', '')]
    [('rd', '  %psr, %o1', '')]
    [('sub', '  %o5, %l0, %g1', '')]

/*----------- Expected result ----------------*/
    ['  %l0, %psr']
    ['  %l0, 3, %o0']
    ['  %psr, %o1']
    ['  %o5, %l0, %g1']

I have no Idea why that parentheses and unwanted quotes are coming !!. 我不知道为什么括号和多余的引号会出现! Thanks in advance. 提前致谢。

if you use grouping in findall, it's going to return all captured groups, if you want some specific parts use slicing: 如果您在findall中使用分组,则如果想要某些特定部分使用切片,它将返回所有捕获的组:

m = re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)[0][-2:-1]

Additionally you can solve your problem without regex, you already checking if string has any of those ['add', 'mov','sub','rd', 'and'] , so you can split the string and pick two last elemnts: 另外,您可以不用正则表达式来解决问题,您已经检查了字符串是否具有任何['add', 'mov','sub','rd', 'and'] ,因此您可以拆分字符串并选择最后两个elemnts:

m = ' '.join(objdump_line.split()[-2:])

Quoting from python documentation from here about findall 这里引用python文档中关于findall

Return all non-overlapping matches of pattern in string, as a list of strings. 返回字符串中模式的所有非重叠匹配项,作为字符串列表。 The string is scanned left-to-right, and matches are returned in the order found. 从左到右扫描该字符串,并以找到的顺序返回匹配项。 If one or more groups are present in the pattern, return a list of groups; 如果该模式中存在一个或多个组,则返回一个组列表;否则,返回一个列表。 this will be a list of tuples if the pattern has more than one group. 如果模式包含多个组,则这将是一个元组列表。 Empty matches are included in the result unless they touch the beginning of another match. 空匹配项将包括在结果中,除非它们碰到另一个匹配项的开头。

The parenthesis represents one group or list that is found and it contains another list which contains all captured groups. 括号表示找到的一个组或列表,并且包含另一个包含所有捕获的组的列表。 There can be multiple groups that can be found. 可以找到多个组。 You can access it as 您可以通过以下方式访问它

re.findall ('(add|mov|rd|sub|add)(.*?)($|\n)', objdump_line, re.DOTALL)[0][1]
0 represents the first group and 1 represents first element of the list of that group as you do not want any other element

The capturing group tries to capture the expression matched between the parenthesis. 捕获组尝试捕获括号之间匹配的表达式。 But for the last capturing group there is no text. 但是对于最后一个捕获组,没有文本。 So you are getting an empty '' 所以你得到一个空的''

As you mentioned in your comment about using this 正如您在有关使用此功能的评论中提到的那样

add(.*?)$

Instead of try this 而不是尝试这个

(add)(.*?)$

The () indicates capturing group and you will get the result as expected ()表示正在捕获组,您将获得预期的结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM