[英]Strip multiline python docstrings with regex
I want to strip all python docstrings out of a file using simple search and replace, and the following (extremely) simplistic regex does the job for one line doc strings:我想使用简单的搜索和替换从文件中删除所有 python 文档字符串,以下(非常)简单的正则表达式为一行文档字符串完成这项工作:
""".*"""
How can I extend that to work with multi-liners?我如何扩展它以使用多线?
Tried to include \\s
in a number of places to no avail.试图在许多地方包含
\\s
无济于事。
As you cannot use an inline s
(DOTALL) modifier, the usual workaround to match any char is using a character class with opposite shorthand character classes:由于您不能使用内联
s
(DOTALL) 修饰符,匹配任何字符的常用解决方法是使用具有相反速记字符类的字符类:
"""[\s\S]*?"""
or或
"""[\d\D]*?"""
or或
"""[\w\W]*?"""
will match """
then any 0+ chars, as few as possible as *?
is a lazy quantfiier, and then trailing """
.将匹配
"""
然后任何 0+ 个字符,尽可能少*?
是一个惰性量词,然后是尾随"""
。
Sometimes there are multiline strings that are not docstrings.有时有不是文档字符串的多行字符串。 For example, you may have a complicated SQL query that extends across multiple lines.
例如,您可能有一个跨多行扩展的复杂 SQL 查询。 The following attempts to look for multiline strings that appear before class definitions and after function definitions.
以下尝试查找出现在类定义之前和函数定义之后的多行字符串。
import re
input_str = """'''
This is a class level docstring
'''
class Article:
def print_it(self):
'''
method level docstring
'''
print('Article')
sql = '''
SELECT * FROM mytable
WHERE DATE(purchased) >= '2020-01-01'
'''
"""
doc_reg_1 = r'("""|\'\'\')([\s\S]*?)(\1\s*)(?=class)'
doc_reg_2 = r'(\s+def\s+.*:\s*)\n(\s*"""|\s*\'\'\')([\s\S]*?)(\2[^\n\S]*)'
input_str = re.sub(doc_reg_1, '', input_str)
input_str = re.sub(doc_reg_2, r'\1', input_str)
print(input_str)
Prints:打印:
class Article:
def print_it(self):
print('Article')
sql = '''
SELECT * FROM mytable
WHERE DATE(purchased) >= '2020-01-01'
'''
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.