简体   繁体   English

Python 正则表达式仅替换子组

[英]Python regex replace only subgroup

I would like to change all words that occur after from/on to 1_ only that occurs after from or on我想将 from/on 之后出现的所有单词更改为 1_ 仅在 from 或 on 之后出现的单词

input输入

with crossroad
from crossroad
(on pike)

expected output预计 output

with crossroad
from (1_crossroad)
(on (1_pike))

Code I tried:我试过的代码:

rgxsubtable = re.compile(r"(?:from|on)[\s]+([\w\d.\"]+)",re.MULTILINE|re.IGNORECASE) # find the occurance to change
tlist = set(rgxsubtable.findall(input))

for item in tlist:
    input = re.sub(r"(?!\B\w){0}(?<!\w\B)".format(re.escape(item)),"(1_{0})".format(item),input )

This would replace both crossroads instead of only crossroad after "from" which I know.这将取代我所知道的“从”之后的两个十字路口,而不仅仅是十字路口。 But I don't know how to selectively replace the word after from/on only但是我不知道如何有选择地替换 from/on only 之后的词

output obtained output 获得

with (1_crossroad)
from (1_crossroad)
(on (1_pike))

You can use您可以使用

import re
text = "with crossroad\nfrom crossroad\n(on pike)"
rgxsubtable = re.compile(r"\b((?:from|on)\s+)([\w.\"]+)", re.IGNORECASE) # find the occurance to change
print( rgxsubtable.sub(r"\1(1_\2)", text) )

See the Python demo and the regex demo .请参阅Python 演示正则表达式演示

NOTE : If the strings after from and on are just non-whitespace text chunks, replace [\w.\"]+ with \S+ .注意:如果fromon之后的字符串只是非空白文本块,请将[\w.\"]+替换为\S+

Also, pay attention to the replacement pattern: it is defined with a raw string literal, r"..." , so as to avoid overescaping.另外,请注意替换模式:它是用原始字符串文字r"..."定义的,以避免过度转义。

Pattern details图案细节

  • \b - a word boundary to make sure we match whole words \b - 一个单词边界,以确保我们匹配整个单词
  • ((?:from|on)\s+) - Group 1 ( \1 ): from or on and then one or more whitespaces ((?:from|on)\s+) - 第 1 组( \1 ): fromon ,然后是一个或多个空格
  • ([\w.\"]+) - Group 2 ( \2 ): one or more word, . or " chars. ([\w.\"]+) - 第 2 组 ( \2 ):一个或多个单词, ."字符。
  • \S+ - matches one or more chars other than whitespace. \S+ - 匹配除空格以外的一个或多个字符。

My solution to this would be the following:我对此的解决方案如下:

import re
from typing import List

string = """with crossroad
from crossroad
(on pike)"""

exclusion_list: List[str] = ["\n", "\\)"]
string = re.sub(fr"from ([^{''.join(exclusion_list)}]*)", r"from (1_\g<1>)", string)
string = re.sub(fr"on ([^{''.join(exclusion_list)}]*)", r"on (1_\g<1>)", string)
print(string)

Output: Output:

with crossroad  
from (1_crossroad)  
(on (1_pike))

This assumes there is a \n character between lines, to capture the whole expression after the keyword.这假设行与行之间有一个\n字符,以捕获关键字后的整个表达式。 In this code, each re.sub call will replace every occurrences of one case, either from XXX or on XXX .在此代码中,每个re.sub调用将替换一个案例的所有出现,无论是from XXX还是on XXX
Additionally, pay attention this works in this case, but might break in other cases, for instance if you had [on pike] , the resulting line would be [on (pike]) .此外,请注意这在这种情况下有效,但在其他情况下可能会中断,例如,如果您有[on pike] ,结果行将是[on (pike]) You might want to add some characters to the exclusion list.您可能希望将一些字符添加到排除列表中。
The exclusion list is then added to the pattern by using a formatted ( f ) raw ( r ) string.然后使用格式化 ( f ) 原始 ( r ) 字符串将排除列表添加到模式中。 This will capture everything on the line until one of the excluded characters are present.这将捕获行中的所有内容,直到存在一个排除的字符。
This has one major consequence, the characters in the exclusion list need to be properly escaped to achieve your goal.这会产生一个主要后果,排除列表中的字符需要正确转义才能实现您的目标。 For instance, if you wanted to capture only the first word after from and on , you would want to add the space as an excluded character.例如,如果您只想捕获fromon之后的第一个单词,您可能希望将空格添加为排除字符。 For this, the pattern itself would need \s to be added, thus we would need to add \\s in the list (we need to double escape in order for a single escape to be present in the pattern).为此,模式本身需要添加\s ,因此我们需要在列表中添加\\s (我们需要双重转义以便在模式中出现单个转义)。
Finally, we are here using the same exclusion list for both cases, you obviously can use two different lists.最后,我们在这里对两种情况使用相同的排除列表,您显然可以使用两个不同的列表。

You can use a positive look behind to match from and on without including them in the answer.您可以on不将它们包含在答案中的情况下使用积极的背后观察来匹配from

This Regex (?<=(from|on) )([^\s]+) , will match any string after from and on such that you can replace it.此正则表达式(?<=(from|on) )([^\s]+)将匹配fromon之后的任何字符串,以便您可以替换它。

You can see it in action here你可以在这里看到它的实际效果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM