简体   繁体   English

如何使用python脚本解析python循环?

[英]How to parse a python Loops using a python script?

My main objective is to parse python loops such that i can insert few statements for my analysis. 我的主要目标是解析python循环,这样我就可以为我的分析插入一些语句。

Normal code:
#A.py

[code Starts]
.
.
.
while [condition]:
    [statements]
    [statements]
    [statements]

.
.
.
[code ends]

Instrumented code: 仪表代码:

Normal code:
#A.py

[code Starts]
.
.
.
count =0                                    <---------- inserted code 
print "Entry of loop"                       <---------- inserted code
while [condition]:
    print "Iteration Number " + count++     <---------- inserted code
    [statements]
    [statements]
    [statements]
print "Exit of loop"                        <---------- inserted code
.
.
.
[code ends]

My objective is to insert the above codes in the appropriate locations with proper indentation. 我的目标是通过适当的缩进将上述代码插入适当的位置。 The loop can also be a for loop. 循环也可以是for循环。 To achieve the above instrumented code i need to parse the Loops in A.py file and insert those code. 要实现上述检测代码,我需要解析A.py文件中的循环并插入这些代码。

Is there a good way to parse these loops and get the line number of the loop so that i can instrument? 有没有一种很好的方法来解析这些循环并得到循环的行号,以便我可以检测?

Thank you 谢谢

Parsing is usually a difficult task. 解析通常是一项艰巨的任务。 You can use the Pygments python library which is a syntax highlighting library. 您可以使用Pygments python库,它是一个语法高亮库。 This might seems different from what you intend to do but is not. 这似乎与您打算做的不同,但事实并非如此。 After all, coloring code is basically adding Color information to code blocks. 毕竟,着色代码基本上是将颜色信息添加到代码块。

Using the PythonLexer you can extract tokens for each line and add any comments you want. 使用PythonLexer,您可以为每行提取标记并添加所需的任何注释。 This will come handy if you don't want to just work on while loops but also on for loops, if statements ... 如果您不想只处理while循环,而且还需要for循环,if语句,这将会派上用场......

pyparsing has a sample file containing a full (?) Python grammar parser. pyparsing有一个包含完整(?)Python语法分析器的示例文件。 On the long run this could be an interesting option -- especially if/when your analysis project will gain more features: 从长远来看,这可能是一个有趣的选择 - 特别是当您的分析项目获得更多功能时:

The simplest way of doing this is to simply scan the file line by line and add the statements when you find a line that matches. 最简单的方法是简单地逐行扫描文件,并在找到匹配的行时添加语句。

The following code does what you want, but it is not robust at all: 以下代码可以满足您的需求,但它根本健壮:

def add_info_on_loops(iterable):
    in_loop = False
    for line in iterable:
        if not in_loop:
            if line.startswith('for ') or line.startswith('while '):
                in_loop = True
                yield 'count = 0\n'
                yield 'print "Entry of loop"\n'
                yield line
                yield '    print "Iteration Number:", count'
                yield '    count += 1\n'
            else:
                yield line
        else:
            if not line.startswith('    '):
                in_loop = False
                yield 'print "Exit of loop"\n'
            yield line

Usage: 用法:

>>> code = StringIO("""[code Starts]
... .
... .
... .
... while [condition]:
...     [statements]
...     [statements]
...     [statements]
... 
... .
... .
... .
... [code ends]""")
>>> print ''.join(add_info_on_loops(code))
[code Starts]
.
.
.
count = 0
print "Entry of loop"
while [condition]:
    print "Iteration Number:", count    count += 1
    [statements]
    [statements]
    [statements]
print "Exit of loop"

.
.
.
[code ends]

Pitfalls of the code: 代码陷阱:

  1. The code handles only loops at the top level. 代码处理顶层的循环。 Something like if condition: for x in a: ... isn't recognized. 类似于if condition: for x in a: ...东西if condition: for x in a: ...无法识别。 This can be solved stripping the lines of whitespace before checking if we got a loop or not(but you then must take into account the different levels of indentation etc.) 这可以解决在检查我们是否有循环之前剥离空白行(但是你必须考虑不同级别的缩进等)
  2. The code breaks whenever a loop has a line that isn't indented. 只要循环有一条没有缩进的行,代码就会中断。 This will happen, for example, if you "split" the code with a blank line and the IDE strips the whitespace. 例如,如果您使用空行“拆分”代码并且IDE剥离空白,则会发生这种情况。 A solution might be to wait for a non-blank, non-indented line instead of a non-indented line. 解决方案可能是等待非空白的非缩进行而不是非缩进行。
  3. The code doesn't handle tabs for indentation(easily fixed) 代码不处理缩进标签(容易修复)
  4. The code doesn't handle one-line loops (eg for x in a: print x ). 代码不处理单行循环(例如, for x in a: print x )。 In this case you'll obtain a wrong output. 在这种情况下,您将获得错误的输出。 Easily fixed checking whether there is something after the : . 在以下之后轻松修复检查是否有东西:
  5. Using a single count variable is troublesome if you want to add support for nested loops. 如果要添加对嵌套循环的支持,使用单个count变量很麻烦。 You should probably have an integer id somewhere and use variable names such as count_0 , count_1 with the id that is incremented every time you find a new loop. 你应该在某个地方有一个整数id,并使用变量名,如count_0count_1 ,每次找到一个新循环时,id都会递增。
  6. The code doesn't handle expressions with parenthesis that do not have whitespace from the keyboard. 代码不处理带有括号的表达式,这些表达式没有来自键盘的空格。 eg for(a,b) in x: isn't detected as a loop, while for (a,b) in x: is detected. 例如, for(a,b) in x:未被检测为循环,而for (a,b) in x:被检测到。 This can be easily solved. 这很容易解决。 First you check whether the line starts with for and while and the next character must not be a letter, number, underscore(actually in python3 you can use unicode characters as well, and this becomes harder to test, but possible). 首先检查行是否以forwhile开头,下一个字符不能是字母,数字,下划线(实际上在python3中你也可以使用unicode字符,这会变得更难测试,但可能)。
  7. The code doesn't handle source code that ends with an indented loop line. 代码不处理以缩进循环行结尾的源代码。 eg for x in a: indented_last_line_of_code() the exit print wont be added.(easily fixed adding a check on in_loop outside the for of the function to see whether we have this situation). 例如for x in a: indented_last_line_of_code()不会添加退出print 。(很容易修复在函数for的外部添加对in_loop的检查,以查看是否存在这种情况)。

As you can see writing a piece of code that does what you asked is not so trivial. 正如你所看到的那样编写一段代码,你所要求的并不是那么简单。 I believe the best you can do is to use ast to parse the code then visit the tree and add the nodes at the correct places, then re-visit the code and generate the python source code(usually nodes have indication on the line in the source code, which allows you to copy-paste the exact same code). 我相信你能做的最好的事情是使用ast来解析代码,然后访问树并在正确的位置添加节点,然后重新访问代码并生成python源代码(通常节点在行中有指示源代码,允许您复制粘贴完全相同的代码)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM