用於長字符串中的注釋的Python正則表達式

Question

我正在嘗試為一個位於長字符串中的python注釋找出一個很好的正則表達式。 到目前為止我有

正則表達式：

#(.?|\n)*

串：

'### this is a comment\na = \'a string\'.toupper()\nprint a\n\na_var_name = " ${an.injection} "\nanother_var = " ${bn.injection} "\ndtabse_conn = " ${cn.injection} "\n\ndef do_something()\n    # this call outputs an xml stream of the current parameter dictionary.\n    paramtertools.print_header(params)\n\nfor i in xrange(256):    # wow another comment\n    print i**2\n\n'

我覺得有更好的方法可以從字符串中獲取所有單獨的注釋，但我不是正則表達式的專家。 有沒有人有更好的解決方案？

Answer 1

如果你做兩件事，正則表達式會正常工作：

刪除所有字符串文字（因為它們可以包含#字符）。
捕獲以#字符開頭的所有內容並繼續到行尾。

以下是演示：

>>> from re import findall, sub
>>> string = '### this is a comment\na = \'a string\'.toupper()\nprint a\n\na_var_name = " ${an.injection} "\nanother_var = " ${bn.injection} "\ndtabse_conn = " ${cn.injection} "\n\ndef do_something()\n    # this call outputs an xml stream of the current parameter dictionary.\n    paramtertools.print_header(params)\n\nfor i in xrange(256):    # wow another comment\n    print i**2\n\n'
>>> findall("#.*", sub('(?s)\'.*?\'|".*?"', '', string))
['### this is a comment', '# this call outputs an xml stream of the current parameter dictionary.', '# wow another comment']
>>>

re.sub刪除"..."或'...'形式的任何內容。 這使您不必擔心字符串文字內的注釋。

(?s)設置dot-all標志，允許. 匹配換行符。

最后， re.findall獲取以#字符開頭的所有內容並繼續到行尾。

要進行更完整的測試，請將此示例代碼放在名為test.py的文件中：

# Comment 1  
for i in range(10): # Comment 2
    print('#foo')
    print("abc#bar")
    print("""
#hello
abcde#foo
""")  # Comment 3
    print('''#foo
    #foo''')  # Comment 4

上面給出的解決方案仍然有效：

>>> from re import findall, sub
>>> string = open('test.py').read()
>>> findall("#.*", sub('(?s)\'.*?\'|".*?"', '', string))
['# Comment 1', '# Comment 2', '# Comment 3', '# Comment 4']
>>>

Answer 2

由於這是字符串中的python代碼，我使用tokenize模塊來解析它並提取注釋：

import tokenize
import StringIO

text = '### this is a comment\na = \'a string\'.toupper()\nprint a\n\na_var_name = " ${an.injection} "\nanother_var = " ${bn.injection} "\ndtabse_conn = " ${cn.injection} "\n\ndef do_something():\n    # this call outputs an xml stream of the current parameter dictionary.\n    paramtertools.print_header(params)\n\nfor i in xrange(256):    # wow another comment\n    print i**2\n\n'

tokens = tokenize.generate_tokens(StringIO.StringIO(text).readline)
for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokens:
    if toktype == tokenize.COMMENT:
        print ttext

打印：

### this is a comment
# this call outputs an xml stream of the current parameter dictionary.
# wow another comment

請注意，字符串中的代碼具有語法錯誤：missing : do_something()函數定義之后。

另外，請注意， ast模塊在這里沒有幫助，因為它不保留注釋。

Answer 3

從索引1處的匹配組中獲取評論。

(#+[^\\\n]*)

DEMO

示例代碼：

import re
p = re.compile(ur'(#+[^\\\n]*)')
test_str = u"..."

re.findall(p, test_str)

火柴：

1.  ### this is a comment
2.  # this call outputs an xml stream of the current parameter dictionary.
3.  # wow another comment

用於長字符串中的注釋的Python正則表達式

問題描述

3 個解決方案

解決方案1
1 2014-07-18 16:33:21

解決方案2
1 2014-07-18 16:37:41

解決方案3
1 已采納 2014-07-18 18:00:59

用於長字符串中的注釋的Python正則表達式

問題描述

3 個解決方案

解決方案1 1 2014-07-18 16:33:21

解決方案2 1 2014-07-18 16:37:41

解決方案3 1 已采納 2014-07-18 18:00:59

解決方案1
1 2014-07-18 16:33:21

解決方案2
1 2014-07-18 16:37:41

解決方案3
1 已采納 2014-07-18 18:00:59