[英]Python find and replace multiple comment lines in array with parsed single line comment
假設我們已經讀取了一個 python 文件,其中包含多行注釋和一些代碼。 這作為list
或np.ndarray
存儲在data
中
data = ["# this", "# is" "# the first comment", "print('hello world')", "# second comment"]
expected_output = ["```this is the first comment```", "print('hello world')", "``` second comment```"]
expected_output
所需的 output 會將以#
字符開頭的多個元素替換為包含在backtick
字符中的單個解析注釋
['```this is the first comment```',
"print('hello world')",
'``` second comment```']
我可以進行解析,但我不知道如何用新格式化的單行替換單行(例如上例中的索引[0, 1, 2]
)。
到目前為止的腳本:
from pathlib import Path
import numpy as np
from itertools import groupby
from operator import itemgetter
def get_consecutive_group_edges(data: np.ndarray):
# https://stackoverflow.com/a/2154437/9940782
edges = []
for k, g in groupby(enumerate(data),lambda x:x[0]-x[1]):
group = (map(itemgetter(1),g))
group = list(map(int, group))
edges.append((group[0],group[-1]))
# convert ranges into group index
# https://stackoverflow.com/a/952952/9940782
group_lookup = dict(enumerate(edges))
return group_lookup
if __name__ == "__main__":
# https://stackoverflow.com/a/17141572/9940782
filedata = ["# this", "# is" "# the first comment", "print('hello world')", "# second comment"]
# find all consecutive lines starting as comments
comment_lines = np.argwhere([l[0] == "#" for l in filedata])
group_lookup = get_consecutive_group_edges(comment_lines)
output_lines = []
for comment_idx in group_lookup.keys():
# extract the comment groups
min_comment_line = group_lookup[comment_idx][0]
max_comment_line = group_lookup[comment_idx][1] + 1
data = filedata[min_comment_line: max_comment_line]
# remove the comment characters
output = "".join(data).replace("\n", " ").replace("#", "")
# wrap in ```
output = "```" + output + "```" + "\n"
我在最后一步失敗了:如何用單個新解析的output
替換每個group
的min_comment_line
和max_comment_line
之間的所有值?
我可以對未注釋的行做些什么嗎?
non_comment_lines = np.argwhere([l[0] != "#" for l in filedata])
可以賦值給 Python 中的一個列表切片,可以用一個替換多個元素:
...
# make a copy of the original list, so we can replace the comments
output_lines = filedata.copy()
# iterate backwards so the indices line up
for comment_idx in reversed(group_lookup):
# extract the comment groups
min_comment_line = group_lookup[comment_idx][0]
max_comment_line = group_lookup[comment_idx][1] + 1
data = filedata[min_comment_line:max_comment_line]
# remove the comment characters
output = "".join(data).replace("\n", " ").replace("#", "")
# wrap in ```
output = "```" + output + "```"
output_lines[min_comment_line:max_comment_line] = [output]
然而,整個操作可以簡單得多,因為groupby
只對連續匹配的元素進行分組:
output_lines = []
# iterate over consecutive sections of comments and code
for is_comment, lines in groupby(filedata, key=lambda x: x[0] == "#"):
if is_comment:
# remove the comment characters
output = "".join(lines).replace("\n", " ").replace("#", "")
# wrap in ```
output_lines.append("```" + output + "```")
else:
# leave code lines unchanged
output_lines.extend(lines)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.