简体   繁体   English

Python正则表达式将第1组替换为某些字符串

[英]Python regex substitute group 1 to certain string

I wonder how to substitute group 1 with certain string by regex in python. 我想知道如何在python中用正则表达式用某些字符串替换组1。

Question1: 问题1:

str = "aaa bbb ccc"
regex = "\baaa (bbb)\b"
repl = "111 bbb 222"

Use regex to match str, matched "aaa bbb", and replace group1 "bbb" with "111 bbb 222", and get the result "aaa 111 bbb 222 ccc" 使用正则表达式匹配str,匹配“ aaa bbb”,并将group1“ bbb”替换为“ 111 bbb 222”,并得到结果“ aaa 111 bbb 222 ccc”

str_repl = "aaa 111 bbb 222 ccc"

Thanks for @RomanPerekhrest and @janos 's lookbehind method. 感谢@RomanPerekhrest和@janos的lookbehind方法。

And I wonder how to solve a more general scenario: 而且我想知道如何解决更一般的情况:

Question2: 问题2:

s1 = "bBb"
regex = "(?<=\baaa )" + s1 + "\b"  # may not suitable
repl = "XxX " + s1 + " YyY"

target: 目标:

s0 = "aaa bBb ccc"
s0_repl = "aaa XxX bBb YyY ccc"

s1 = "aaa bbb ccc"
no match

s2 = "AAA bBb ccc"
s2_repl = "AAA XxX bBb YyY ccc"

Ignore the case for substring except of s1 when matching in original string. 与原始字符串匹配时,忽略s1以外的子字符串的大小写。

Question3: 问题3:

s1 = "bbb"
regex = "(?<=\baaa )" + s1 + "\b"  # may not suitable
repl = "XxX " + s1 + " YyY"

target: 目标:

s0 = "aaa bBb ccc"
s0_repl = "aaa XxX bBb YyY ccc"

s1 = "aaa bbb ccc"
s1_repl = "aaa XxX bbb YyY ccc"

s2 = "AAA BBB ccc"
s2_repl = "AAA XxX BBB YyY ccc"

Ignore the case for substring except of s1 when matching & substituting in original string. 匹配并替换原始字符串时,忽略s1以外的子字符串的大小写。

Question4: 问题4:

If there is a way to substitute group 1 on original string by regex on python? 是否有办法在python上用正则表达式替换原始字符串上的组1?

You can use the re package, and positive look-behind: 您可以使用re包,并进行积极的回顾:

import re
s = "aaa bbb ccc"
regex = r"\b(?<=aaa )(bbb)\b"
repl = "111 bbb 222"
print(re.sub(regex, repl, s))

This will produce: 这将产生:

aaa 111 bbb 222 ccc

Notice the changes I did there: 注意我在那里所做的更改:

  • The aaa prefix in the regex is wrapped in (?<=...) . 正则表达式中的aaa前缀包装在(?<=...) This means, match bbb if it follows aaa , without including aaa in the pattern to replace. 这意味着,如果bbb跟随aaa ,则匹配bbb ,而不在要替换的模式中包含aaa This is called positive lookbehind . 这称为正向后看 Without this change to your regex, the aaa would disappear together with bbb 如果不对正则表达式进行此更改,则aaa会与bbb一起消失
  • Regular expression strings should be written as r"..." , to make them raw strings, in order to avoid problems with escape sequences 正则表达式字符串应写为r"..." ,使其成为原始字符串,以避免转义序列出现问题
  • I renamed the str variable to s , because str is a reserved word in Python, as @elena also pointed out. 我将str变量重命名为s ,因为str是Python中的保留字,正如@elena指出的那样。

To replace sequence bbb which should be preceded by sequence aaa use the following approach: 要替换序列bbb (应在序列aaa之前),请使用以下方法:

s = "aaa bbb ccc"
regex = r"(?<=aaa )bbb\b"
repl = "111 bbb 222"

str_replaced = re.sub(regex, repl, s)
print(str_replaced)

The output: 输出:

aaa 111 bbb 222 ccc

(?<=aaa ) - lookbehind positive assertion, ensures that "bbb" is preceded by "aaa " (?<=aaa ) -在肯定断言后面,确保"bbb"后面带有"aaa "

http://www.regular-expressions.info/lookaround.html http://www.regular-expressions.info/lookaround.html

First of all, don't use str as a variable name. 首先,不要使用str作为变量名。 It's a reserved keyword in Python. 这是Python中的保留关键字。

import re

str1 = "aaa bbb ccc"
re.sub("bbb", "111 bbb 222", str1)
Out[11]: 'aaa 111 bbb 222 ccc'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM