简体   繁体   English

在 Python 中仅替换正则表达式字符串的一部分的方法

[英]Way to substitute only part of a regex string in Python

I am working with a text file that has text laid out like below:我正在使用一个文本文件,其文本布局如下:

SCN DD1251       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        DD1271      C           DD1271    R                                     
        DD1351      D           DD1351    B                                     
                    E                                                           
                                                                                
SCN DD1271       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        DD1301      T           DD1301    A                                     
        DD1251      R           DD1251    C                                     
                                                                                
SCN DD1301       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        DD1271      A           DD1271    T                                     
                    B                                                           
                    C                                                           
                    D                                                           
                                                                                
SCN DD1351       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
                    A           DD1251    D                                     
        DD1251      B                                                           
                    C   

I am currently using the following regex pattern to match the Node followed by the 5 wide space and following letter like so:我目前正在使用以下正则表达式模式来匹配节点,后跟 5 个宽空格和以下字母,如下所示:

DD1251      B

[A-Z]{2}[0-9]{3}[0-9A-Z]      [A-Z]

My goal is to replace the 5 wide space with an underscore to look like so:我的目标是用下划线替换 5 宽空间,如下所示:

DD1251_B

I am trying to achieve this using the following code:我正在尝试使用以下代码来实现这一点:

def RemoveLinkSpace(input_file, output_file, pattern):
  with open(str(input_file) + ".txt", "r") as file_input:
    with open(str(output_file) + ".txt", "w") as output: 
        for line in file_input:
               line = pattern.sub("_", line)
               output.write(line)

upstream_pattern = re.compile(r"[A-Z]{2}[0-9]{3}[0-9A-Z]      [A-Z]")

RemoveLinkSpace("File1","File2",upstream_pattern)

However, this results in a text file that looks like the below pattern:但是,这会生成一个类似于以下模式的文本文件:

SCN DD1251       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        _      C           DD1271    R                                     
        _      D           DD1351    B                                     
                    E                                                           
                                                                                
SCN DD1271       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        _      T           DD1301    A                                     
        _      R           DD1251    C      

                           

My question is, is there a way to still search for the entire regex, but then to only replace the spaces contained within in?我的问题是,有没有办法仍然搜索整个正则表达式,但只替换其中包含的空格?

We can replace by group, you missed this point.我们可以按组替换,您错过了这一点。 \1 means the first group, \2 second group So in search pattern ([AZ]{2}[0-9]{3}[0-9A-Z]) is first pattern and ([AZ]) is second pattern. \1 表示第一组,\2 第二组 所以在搜索模式中 ([AZ]{2}[0-9]{3}[0-9A-Z]) 是第一个模式, ([AZ]) 是第二个模式.
Also, space between group1 and group 2 exists not 5, just 6. so I search over 5 continue space.此外,group1 和 group 2 之间的空间不是 5,只有 6。所以我搜索了 5 个连续空间。

def RemoveLinkSpace(input_file, output_file, pattern):
  with open(str(input_file) + ".txt", "r") as file_input:
    with open(str(output_file) + ".txt", "w") as output: 
        for line in file_input:
               line = re.sub(pattern,r"\1_\2", line)
               output.write(line)

upstream_pattern = re.compile(r"([A-Z]{2}[0-9]{3}[0-9A-Z])[ ]{5,}([A-Z])")


RemoveLinkSpace("in","out", upstream_pattern)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM