什么正则表达式会在Python源代码中找到三重引用注释(可能是多行)?
Python is not a regular language and cannot reliably be parsed using regex.
If you want a proper Python parser, look at the ast module. You may be looking for get_docstring
.
re.findall('(?:\n[\t ]*)\"{3}(.*?)\"{3}', s, re.M | re.S)
仅捕获在行的乞讨中的三引号内的文本,并且可以在空格,制表符或任何内容之前,因为python docstrings应该是。
I find this to be working perfectly for me (used it with TextMate):
"{3}([\s\S]*?"{3})
I wanted to remove all comments from a library and this took care of the triple-quote comments (single or multi-line, regardless of where they started on the line).
For hash comments (much easier), this works:
#.*$
I used these with TextMate, which uses the Oniguruma regular expression library by K. Kosako (http://manual.macromates.com/en/regular_expressions)
I've found this one from Tim Peters (I think) :
pat = """
qqq
[^\\q]*
(
( \\\\[\000-\377]
| q
( \\\\[\000-\377]
| [^\\q]
| q
( \\\\[\000-\377]
| [^\\q]
)
)
)
[^\\q]*
)*
qqq
"""
pat = ''.join(pat.split(), '')
tripleQuotePat = pat.replace("q", "'") + "|" + pat.replace('q', '"')
But, as stated by bobince, regex alone doesn't seem to be the right tool for parsing Python code.
So I went with tokenize from the standard library.
I don't know how well this will fair when scanning Python code, but this seems to match Python strings in isolation.
^(\"([^\"\n\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*\"|'([^'\n\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*'|\"\"\"((?!\"\"\")[^\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*\"\"\")$
The escaping is not standard Python; this is something that I cut-n-pasted from a project. See it in action at regex101.com .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.