簡體   English   中英

Python 用解析的單行注釋查找並替換數組中的多個注釋行

[英]Python find and replace multiple comment lines in array with parsed single line comment

假設我們已經讀取了一個 python 文件,其中包含多行注釋和一些代碼。 這作為listnp.ndarray存儲在data

data = ["# this", "# is" "# the first comment", "print('hello world')", "# second comment"]

expected_output = ["```this is the first comment```", "print('hello world')", "``` second comment```"]
expected_output

所需的 output 會將以#字符開頭的多個元素替換為包含在backtick字符中的單個解析注釋

['```this is the first comment```',
 "print('hello world')",
 '``` second comment```']

我可以進行解析,但我不知道如何用新格式化的單行替換單行(例如上例中的索引[0, 1, 2] )。

到目前為止的腳本:

from pathlib import Path
import numpy as np 
from itertools import groupby
from operator import itemgetter


def get_consecutive_group_edges(data: np.ndarray):
    # https://stackoverflow.com/a/2154437/9940782
    edges = []

    for k, g in groupby(enumerate(data),lambda x:x[0]-x[1]):
        group = (map(itemgetter(1),g))
        group = list(map(int, group))
        edges.append((group[0],group[-1]))
    
    # convert ranges into group index
    # https://stackoverflow.com/a/952952/9940782
    group_lookup = dict(enumerate(edges))

    return group_lookup

if __name__ == "__main__":

    # https://stackoverflow.com/a/17141572/9940782
    filedata = ["# this", "# is" "# the first comment", "print('hello world')", "# second comment"]

    # find all consecutive lines starting as comments
    comment_lines = np.argwhere([l[0] == "#" for l in filedata])
    group_lookup = get_consecutive_group_edges(comment_lines)

    output_lines = []
    for comment_idx in group_lookup.keys():
        # extract the comment groups
        min_comment_line = group_lookup[comment_idx][0]
        max_comment_line = group_lookup[comment_idx][1] + 1
        data = filedata[min_comment_line: max_comment_line]
        
        # remove the comment characters
        output = "".join(data).replace("\n", " ").replace("#", "")
        # wrap in ```
        output = "```" + output + "```" + "\n"

我在最后一步失敗了:如何用單個新解析的output替換每個groupmin_comment_linemax_comment_line之間的所有值?

我可以對未注釋的行做些什么嗎?

non_comment_lines = np.argwhere([l[0] != "#" for l in filedata])

可以賦值給 Python 中的一個列表切片,可以用一個替換多個元素:

    ...
    # make a copy of the original list, so we can replace the comments
    output_lines = filedata.copy()
    # iterate backwards so the indices line up
    for comment_idx in reversed(group_lookup):
        # extract the comment groups
        min_comment_line = group_lookup[comment_idx][0]
        max_comment_line = group_lookup[comment_idx][1] + 1
        data = filedata[min_comment_line:max_comment_line]

        # remove the comment characters
        output = "".join(data).replace("\n", " ").replace("#", "")
        # wrap in ```
        output = "```" + output + "```"
        output_lines[min_comment_line:max_comment_line] = [output]

然而,整個操作可以簡單得多,因為groupby只對連續匹配的元素進行分組:

    output_lines = []
    # iterate over consecutive sections of comments and code
    for is_comment, lines in groupby(filedata, key=lambda x: x[0] == "#"):
        if is_comment:
            # remove the comment characters
            output = "".join(lines).replace("\n", " ").replace("#", "")
            # wrap in ```
            output_lines.append("```" + output + "```")
        else:
            # leave code lines unchanged
            output_lines.extend(lines)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM