简体   繁体   English

如何用 python re.sub 仅替换部分匹配项

[英]How to replace only part of the match with python re.sub

I need to match two cases by one reg expression and do replacement我需要用一个 reg 表达式匹配两种情况并进行替换

'long.file.name.jpg' -> 'long.file.name_ suff .jpg' 'long.file.name.jpg' -> 'long.file.name_ suff.jpg '

'long.file.name_ a .jpg' -> 'long.file.name_ suff .jpg' 'long.file.name_ a .jpg' -> 'long.file.name_ suff .jpg'

I'm trying to do the following我正在尝试执行以下操作

re.sub('(\_a)?\.[^\.]*$' , '_suff.',"long.file.name.jpg")

But this is cut the extension '.jpg' and I'm getting但这是削减扩展名'.jpg',我得到

long.file.name_suff. long.file.name_suff。 instead of long.file.name_suff.jpg I understand that this is because of [^.]*$ part, but I can't exclude it, because I have to find last occurance of '_a' to replace or last '.'而不是 long.file.name_suff.jpg 我知道这是因为 [^.]*$ 部分,但我不能排除它,因为我必须找到最后一次出现的 '_a' 来替换或最后一次 '.'

Is there a way to replace only part of the match?有没有办法只替换部分比赛?

在要保留的部分周围放置一个捕获组,然后在替换文本中包含对该捕获组的引用。

re.sub(r'(\_a)?\.([^\.]*)$' , r'_suff.\2',"long.file.name.jpg")
 re.sub(r'(?:_a)?\.([^.]*)$', r'_suff.\1', "long.file.name.jpg")

?: starts a non matching group ( SO answer ), so (?:_a) is matching the _a but not enumerating it, the following question mark makes it optional. ?:启动一个非匹配组( SO answer ),因此(?:_a)匹配_a但不枚举它,以下问号使其可选。

So in English, this says, match the ending .<anything> that follows (or doesn't) the pattern _a所以在英语中,这说,匹配结尾.<anything>跟随(或不跟随)模式_a

Another way to do this would be to use a lookbehind ( see here ).另一种方法是使用lookbehind见这里)。 Mentioning this because they're super useful, but I didn't know of them for 15 years of doing REs提到这一点是因为它们非常有用,但我在做 REs 的 15 年里都不知道它们

Just put the expression for the extension into a group, capture it and reference the match in the replacement:只需将扩展的表达式放入一个组中,捕获它并在替换中引用匹配项:

re.sub(r'(?:_a)?(\.[^\.]*)$' , r'_suff\1',"long.file.name.jpg")

Additionally, using the non-capturing group (?:…) will prevent re to store to much unneeded information.此外,使用非捕获组(?:…)将防止重新存储大量不需要的信息。

You can do it by excluding the parts from replacing.您可以通过排除更换部件来做到这一点。 I mean, you can say to the regex module;我的意思是,你可以对 regex 模块说; "match with this pattern, but replace a piece of it". “与此模式匹配,但替换其中的一部分”。

re.sub(r'(?<=long.file.name)(\_a)?(?=\.([^\.]*)$)' , r'_suff',"long.file.name.jpg")
>>> 'long.file.name_suff.jpg'

long.file.name and .jpg parts are being used on matching, but they are excluding from replacing. long.file.name.jpg部分被用于匹配,但它们被排除在替换之外。

I wanted to use capture groups to replace a specific part of a string to help me parse it later.我想使用捕获组来替换字符串的特定部分,以帮助我稍后解析它。 Consider the example below:考虑下面的例子:

s= '<td> <address> 110 SOLANA ROAD, SUITE 102<br>PONTE VEDRA BEACH, FL32082 </address> </td>'

re.sub(r'(<address>\s.*?)(<br>)(.*?\<\/address>)', r'\1 -- \3', s)
##'<td> <address> 110 SOLANA ROAD, SUITE 102 -- PONTE VEDRA BEACH, FL32082 </address> </td>'
print(re.sub('name(_a)?','name_suff','long.file.name_a.jpg'))
# long.file.name_suff.jpg

print(re.sub('name(_a)?','name_suff','long.file.name.jpg'))
# long.file.name_suff.jpg

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM