简体   繁体   English

Python中多行文档字符串的正则表达式是什么?

[英]What is the regex for multi-line docstrings in Python?

I am working on checking comments and docstrings within a python file. 我正在检查python文件中的注释和文档字符串。 Right now I'm using regex to check and have succeeded finding single and multi-line comments but unable to find multi-line docstrings. 现在,我正在使用正则表达式进行检查,并已成功找到单行和多行注释,但无法找到多行文档字符串。

I have tried something like r"""[\\S\\s]*?""" 我已经尝试过类似r"""[\\S\\s]*?"""

import re

FILE_SEPARATOR = "/"

MULTILINECOMMENT_RE = r"""/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/"""
SINGLELINECOMMENT_RE_JAVA = r"""^(?:[^"/\\]|\"(?:[^\"\\]|\\.)*
\"|/(?:[^/"\\]|\\.)|/\"(?:[^\"\\]|\\.)*\"|\\.)*//(.*)$"""
SINGLELINECOMMENT_RE_PYTHON = r"""^(?:[^"#\\]|\"(?:[^\"\\]|\\.)*\"|
/(?:[^#"\\]|\\.)|/\"(?:[^\"\\]|\\.)*\"|\\.)*#(.*)$"""
MULTILINEDOCSTRING_RE_PYTHON = r"""[\S\s]*?"""


def count_multiline__docstring_python_comment(contents):
    """Counts the number of multiline Python comments in the code"""
    pattern = re.compile(MULTILINEDOCSTRING_RE_PYTHON, re.MULTILINE)
    matches = pattern.findall(contents)
    return len(matches)

You're not matching the quotes in the docstring. 您不匹配文档字符串中的引号。 The triple quotes in r"""[\\S\\s]*?""" are just delimiting the regexp string, they're not part of the regexp itself. r"""[\\S\\s]*?"""中的三引号只是分隔正则表达式字符串,它们不属于正则表达式本身。 You need: 你需要:

r'"""[\S\s]*?"""'

You can also simplify it to just: 您还可以将其简化为:

r'""".*?"""'

and then use the re.DOTALL flag when matching. 然后在匹配时使用re.DOTALL标志。 This makes . 这使. match newlines. 匹配换行符。

To match only docstrings with at least 2 lines: 要仅匹配至少两行的文档字符串:

r'""".*?\n.*?"""'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM