[英]How to parse a python Loops using a python script?
My main objective is to parse python loops such that i can insert few statements for my analysis. 我的主要目标是解析python循环,这样我就可以为我的分析插入一些语句。
Normal code:
#A.py
[code Starts]
.
.
.
while [condition]:
[statements]
[statements]
[statements]
.
.
.
[code ends]
Instrumented code: 仪表代码:
Normal code:
#A.py
[code Starts]
.
.
.
count =0 <---------- inserted code
print "Entry of loop" <---------- inserted code
while [condition]:
print "Iteration Number " + count++ <---------- inserted code
[statements]
[statements]
[statements]
print "Exit of loop" <---------- inserted code
.
.
.
[code ends]
My objective is to insert the above codes in the appropriate locations with proper indentation. 我的目标是通过适当的缩进将上述代码插入适当的位置。 The loop can also be a for loop.
循环也可以是for循环。 To achieve the above instrumented code i need to parse the Loops in A.py file and insert those code.
要实现上述检测代码,我需要解析A.py文件中的循环并插入这些代码。
Is there a good way to parse these loops and get the line number of the loop so that i can instrument? 有没有一种很好的方法来解析这些循环并得到循环的行号,以便我可以检测?
Thank you 谢谢
Parsing is usually a difficult task. 解析通常是一项艰巨的任务。 You can use the Pygments python library which is a syntax highlighting library.
您可以使用Pygments python库,它是一个语法高亮库。 This might seems different from what you intend to do but is not.
这似乎与您打算做的不同,但事实并非如此。 After all, coloring code is basically adding Color information to code blocks.
毕竟,着色代码基本上是将颜色信息添加到代码块。
Using the PythonLexer you can extract tokens for each line and add any comments you want. 使用PythonLexer,您可以为每行提取标记并添加所需的任何注释。 This will come handy if you don't want to just work on while loops but also on for loops, if statements ...
如果您不想只处理while循环,而且还需要for循环,if语句,这将会派上用场......
The simplest way of doing this is to simply scan the file line by line and add the statements when you find a line that matches. 最简单的方法是简单地逐行扫描文件,并在找到匹配的行时添加语句。
The following code does what you want, but it is not robust at all: 以下代码可以满足您的需求,但它根本不健壮:
def add_info_on_loops(iterable):
in_loop = False
for line in iterable:
if not in_loop:
if line.startswith('for ') or line.startswith('while '):
in_loop = True
yield 'count = 0\n'
yield 'print "Entry of loop"\n'
yield line
yield ' print "Iteration Number:", count'
yield ' count += 1\n'
else:
yield line
else:
if not line.startswith(' '):
in_loop = False
yield 'print "Exit of loop"\n'
yield line
Usage: 用法:
>>> code = StringIO("""[code Starts]
... .
... .
... .
... while [condition]:
... [statements]
... [statements]
... [statements]
...
... .
... .
... .
... [code ends]""")
>>> print ''.join(add_info_on_loops(code))
[code Starts]
.
.
.
count = 0
print "Entry of loop"
while [condition]:
print "Iteration Number:", count count += 1
[statements]
[statements]
[statements]
print "Exit of loop"
.
.
.
[code ends]
Pitfalls of the code: 代码陷阱:
if condition: for x in a: ...
isn't recognized. if condition: for x in a: ...
东西if condition: for x in a: ...
无法识别。 This can be solved stripping the lines of whitespace before checking if we got a loop or not(but you then must take into account the different levels of indentation etc.) for x in a: print x
). for x in a: print x
)。 In this case you'll obtain a wrong output. :
. :
。 count
variable is troublesome if you want to add support for nested loops. count
变量很麻烦。 You should probably have an integer id somewhere and use variable names such as count_0
, count_1
with the id that is incremented every time you find a new loop. count_0
, count_1
,每次找到一个新循环时,id都会递增。 for(a,b) in x:
isn't detected as a loop, while for (a,b) in x:
is detected. for(a,b) in x:
未被检测为循环,而for (a,b) in x:
被检测到。 This can be easily solved. for
and while
and the next character must not be a letter, number, underscore(actually in python3 you can use unicode characters as well, and this becomes harder to test, but possible). for
和while
开头,下一个字符不能是字母,数字,下划线(实际上在python3中你也可以使用unicode字符,这会变得更难测试,但可能)。 for x in a: indented_last_line_of_code()
the exit print
wont be added.(easily fixed adding a check on in_loop
outside the for
of the function to see whether we have this situation). for x in a: indented_last_line_of_code()
不会添加退出print
。(很容易修复在函数for
的外部添加对in_loop
的检查,以查看是否存在这种情况)。 As you can see writing a piece of code that does what you asked is not so trivial. 正如你所看到的那样编写一段代码,你所要求的并不是那么简单。 I believe the best you can do is to use
ast
to parse the code then visit the tree and add the nodes at the correct places, then re-visit the code and generate the python source code(usually nodes have indication on the line in the source code, which allows you to copy-paste the exact same code). 我相信你能做的最好的事情是使用
ast
来解析代码,然后访问树并在正确的位置添加节点,然后重新访问代码并生成python源代码(通常节点在行中有指示源代码,允许您复制粘贴完全相同的代码)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.