简体   繁体   English

正则表达式匹配 Python 文档字符串

[英]Regex to match Python docstrings

I would like to parse Python docstrings as follows:我想按如下方式解析 Python 文档字符串:

Summary of class that is multiple lines.

Parameters
----------
param1 : str
    Param 1 is a param

Returns
-------
value : str

Examples
--------
>>> print()

Maps to映射到

{
    'base': 'Summary of class that is multiple lines.',
    'params': 'param1 : str\n\tParam 1 is a param',
    'returns': 'value : str',
    'examples': '>>> print()'
}

This is pretty straightforward to do with named groups and re.match.groupdict , but the issue I am running into is that each of these four groups are optional.这对于命名组和re.match.groupdict来说非常简单,但我遇到的问题是这四个组中的每一个都是可选的。 There are several questions on here about optional groups, specifically this one seems relevant, but this has nice ending characters to break things up.这里有几个关于可选组的问题,特别是这个似乎相关,但它有很好的结束字符来分解事情。 This docstring can have any characters (currently using [\s\S] ).此文档字符串可以包含任何字符(当前使用[\s\S] )。

I think this should work:我认为这应该有效:

^(?P<base>[\s\S]+?)??(?:(?:^|\n\n)Parameters\n-{10}\n(?P<params>[\s\S]*?))?(?:(?:^|\n\n)Returns\n-{7}\n(?P<returns>[\s\S]*?))?(?:(?:^|\n\n)Examples\n-{8}\n(?P<examples>[\s\S]*))?$

The code I used to generate this regex:我用来生成这个正则表达式的代码:

import re

sep_regex = r"(?:^|\n\n)"
summary_regex  = r"(?P<base>[\s\S]+?)"
param_regex    = rf"(?:{sep_regex}Parameters\n-{{10}}\n(?P<params>[\s\S]*?))"
returns_regex  = rf"(?:{sep_regex}Returns\n-{{7}}\n(?P<returns>[\s\S]*?))"
examples_regex = rf"(?:{sep_regex}Examples\n-{{8}}\n(?P<examples>[\s\S]*))"

combined_regex = rf"^{summary_regex}??{param_regex}?{returns_regex}?{examples_regex}?$"

print(combined_regex)

Example:例子:

from pprint import pprint
match = re.search(combined_regex, text)  # text being your example text
pprint(match.groupdict())
# out: {'base': 'Summary of class that is multiple lines.',
# out:  'examples': '>>> print()',
# out:  'params': 'param1 : str\n    Param 1 is a param',
# out:  'returns': 'value : str'}

I also tested it with various sections of the docstring dropped.我还使用删除的文档字符串的各个部分对其进行了测试。

Instead of writing your own regular expressions, you can use existing libraries to parse docstrings, whose authors have already done the hard work for you.您可以使用现有的库来解析文档字符串,而不是编写自己的正则表达式,其作者已经为您完成了艰苦的工作。

I put together an example of this using the docstring-parser package.我使用docstring-parser包整理了一个示例。 To install this package you need to run this command:要安装此软件包,您需要运行以下命令:

pip install docstring-parser

Then you can use the following code to parse your docstring:然后您可以使用以下代码来解析您的文档字符串:

from docstring_parser import parse

docstring_text = """Summary of class that is multiple lines.

Parameters
----------
param1 : str
    Param 1 is a param

Returns
-------
value : str

Examples
--------
>>> print()
"""

docstring = parse(docstring_text)
docstring_info = {
    "base": docstring.short_description,
    "params": [
        {
            "name": param.arg_name,
            "type": param.type_name,
            "description": param.description,
        }
        for param in docstring.params
    ],
    "returns": {
        "name": docstring.returns.return_name,
        "type": docstring.returns.type_name,
    }
    if docstring.returns
    else {},
    "examples": [{"snippet": example.snippet} for example in docstring.examples],
}
print(docstring_info)

This gives the following output (with indentation added for clarity):这给出了以下输出(为清楚起见添加了缩进):

{
    "base": "Summary of class that is multiple lines.",
    "params": [{"name": "param1", "type": "str", "description": "Param 1 is a param"}],
    "returns": {"name": "value", "type": "str"},
    "examples": [{"snippet": ">>> print()"}],
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM