[英]Python regex substitute group 1 to certain string
I wonder how to substitute group 1 with certain string by regex in python. 我想知道如何在python中用正则表达式用某些字符串替换组1。
Question1: 问题1:
str = "aaa bbb ccc"
regex = "\baaa (bbb)\b"
repl = "111 bbb 222"
Use regex to match str, matched "aaa bbb", and replace group1 "bbb" with "111 bbb 222", and get the result "aaa 111 bbb 222 ccc" 使用正则表达式匹配str,匹配“ aaa bbb”,并将group1“ bbb”替换为“ 111 bbb 222”,并得到结果“ aaa 111 bbb 222 ccc”
str_repl = "aaa 111 bbb 222 ccc"
Thanks for @RomanPerekhrest and @janos 's lookbehind method. 感谢@RomanPerekhrest和@janos的lookbehind方法。
And I wonder how to solve a more general scenario: 而且我想知道如何解决更一般的情况:
Question2: 问题2:
s1 = "bBb"
regex = "(?<=\baaa )" + s1 + "\b" # may not suitable
repl = "XxX " + s1 + " YyY"
target: 目标:
s0 = "aaa bBb ccc"
s0_repl = "aaa XxX bBb YyY ccc"
s1 = "aaa bbb ccc"
no match
s2 = "AAA bBb ccc"
s2_repl = "AAA XxX bBb YyY ccc"
Ignore the case for substring except of s1 when matching in original string. 与原始字符串匹配时,忽略s1以外的子字符串的大小写。
Question3: 问题3:
s1 = "bbb"
regex = "(?<=\baaa )" + s1 + "\b" # may not suitable
repl = "XxX " + s1 + " YyY"
target: 目标:
s0 = "aaa bBb ccc"
s0_repl = "aaa XxX bBb YyY ccc"
s1 = "aaa bbb ccc"
s1_repl = "aaa XxX bbb YyY ccc"
s2 = "AAA BBB ccc"
s2_repl = "AAA XxX BBB YyY ccc"
Ignore the case for substring except of s1 when matching & substituting in original string. 匹配并替换原始字符串时,忽略s1以外的子字符串的大小写。
Question4: 问题4:
If there is a way to substitute group 1 on original string by regex on python? 是否有办法在python上用正则表达式替换原始字符串上的组1?
You can use the re
package, and positive look-behind: 您可以使用re
包,并进行积极的回顾:
import re
s = "aaa bbb ccc"
regex = r"\b(?<=aaa )(bbb)\b"
repl = "111 bbb 222"
print(re.sub(regex, repl, s))
This will produce: 这将产生:
aaa 111 bbb 222 ccc
Notice the changes I did there: 注意我在那里所做的更改:
aaa
prefix in the regex is wrapped in (?<=...)
. 正则表达式中的aaa
前缀包装在(?<=...)
。 This means, match bbb
if it follows aaa
, without including aaa
in the pattern to replace. 这意味着,如果bbb
跟随aaa
,则匹配bbb
,而不在要替换的模式中包含aaa
。 This is called positive lookbehind . 这称为正向后看 。 Without this change to your regex, the aaa
would disappear together with bbb
如果不对正则表达式进行此更改,则aaa
会与bbb
一起消失 r"..."
, to make them raw strings, in order to avoid problems with escape sequences 正则表达式字符串应写为r"..."
,使其成为原始字符串,以避免转义序列出现问题 str
variable to s
, because str
is a reserved word in Python, as @elena also pointed out. 我将str
变量重命名为s
,因为str
是Python中的保留字,正如@elena指出的那样。 To replace sequence bbb
which should be preceded by sequence aaa
use the following approach: 要替换序列bbb
(应在序列aaa
之前),请使用以下方法:
s = "aaa bbb ccc"
regex = r"(?<=aaa )bbb\b"
repl = "111 bbb 222"
str_replaced = re.sub(regex, repl, s)
print(str_replaced)
The output: 输出:
aaa 111 bbb 222 ccc
(?<=aaa )
- lookbehind positive assertion, ensures that "bbb"
is preceded by "aaa "
(?<=aaa )
-在肯定断言后面,确保"bbb"
后面带有"aaa "
http://www.regular-expressions.info/lookaround.html http://www.regular-expressions.info/lookaround.html
First of all, don't use str
as a variable name. 首先,不要使用str
作为变量名。 It's a reserved keyword in Python. 这是Python中的保留关键字。
import re
str1 = "aaa bbb ccc"
re.sub("bbb", "111 bbb 222", str1)
Out[11]: 'aaa 111 bbb 222 ccc'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.