[英]python regex to remove comments
How would I write a regex that removes all comments that start with the # and stop at the end of the line -- but at the same time exclude the first two lines which say我将如何编写一个正则表达式来删除所有以 # 开头并在行尾停止的注释 - 但同时排除前两行说
#!/usr/bin/python
and和
#-*- coding: utf-8 -*-
You can remove comments by parsing the Python code with tokenize.generate_tokens
.您可以通过使用tokenize.generate_tokens
解析 Python 代码来删除注释。 The following is a slightly modified version of this example from the docs :以下是文档中此示例的略微修改版本:
import tokenize
import io
import sys
if sys.version_info[0] == 3:
StringIO = io.StringIO
else:
StringIO = io.BytesIO
def nocomment(s):
result = []
g = tokenize.generate_tokens(StringIO(s).readline)
for toknum, tokval, _, _, _ in g:
# print(toknum,tokval)
if toknum != tokenize.COMMENT:
result.append((toknum, tokval))
return tokenize.untokenize(result)
with open('script.py','r') as f:
content=f.read()
print(nocomment(content))
For example:例如:
If script.py contains如果 script.py 包含
def foo(): # Remove this comment
''' But do not remove this #1 docstring
'''
# Another comment
pass
then the output of nocomment
is那么 nocomment 的nocomment
是
def foo ():
''' But do not remove this #1 docstring
'''
pass
I don't actually think this can be done purely with a regex expression, as you'd need to count quotes to ensure that an instance of #
isn't inside of a string.我实际上并不认为这可以纯粹使用正则表达式来完成,因为您需要计算引号以确保#
的实例不在字符串内。
I'd look into python's built-in code parsing modules for help with something like this.我会查看python 的内置代码解析模块以寻求类似的帮助。
sed -e '1,2p' -e '/^\s*#/d' infile
Then wrap this in a subprocess.Popen
call.然后将其包装在subprocess.Popen
调用中。
However, this doesn't substitute a real parser?但是,这并不能替代真正的解析器? Why would this be of interest, Well: assume this Python script:为什么会感兴趣,嗯:假设这个 Python 脚本:
output = """
This is
#1 of 100"""
Boom, any non-parsing solution instantly breaks your script.繁荣,任何非解析解决方案都会立即破坏您的脚本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.