简体   繁体   English

正则表达式替换(在Python中) - 一种更简单的方法?

[英]Regex replace (in Python) - a simpler way?

Any time I want to replace a piece of text that is part of a larger piece of text, I always have to do something like: 每当我想要替换一段文本时,我总是要做以下事情:

"(?P<start>some_pattern)(?P<replace>foo)(?P<end>end)"

And then concatenate the start group with the new data for replace and then the end group. 然后将start组与replace的新数据连接起来,然后连接到end组。

Is there a better method for this? 有更好的方法吗?

>>> import re
>>> s = "start foo end"
>>> s = re.sub("foo", "replaced", s)
>>> s
'start replaced end'
>>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
>>> s
'start can use a callable for the replaced text too end'
>>> help(re.sub)
Help on function sub in module re:

sub(pattern, repl, string, count=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a callable, it's passed the match object and must return
    a replacement string to be used.

Look in the Python re documentation for lookaheads (?=...) and lookbehinds (?<=...) -- I'm pretty sure they're what you want. 查看Python re文档中的lookaheads (?=...)和lookbehinds (?<=...) - 我很确定它们就是你想要的。 They match strings, but do not "consume" the bits of the strings they match. 它们匹配字符串,但不“消耗”它们匹配的字符串的位。

The short version is that you cannot use variable-width patterns in lookbehinds using Python's re module. 简短的版本是你不能使用 Python的re模块在lookbehinds中使用可变宽度模式。 There is no way to change this: 没有办法改变这个:

>>> import re
>>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
'fooquuxbaz'
>>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    re.sub("(?<=fo+)bar(?=baz)", "quux", string)
  File "C:\Development\Python25\lib\re.py", line 150, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Development\Python25\lib\re.py", line 241, in _compile
    raise error, v # invalid expression
error: look-behind requires fixed-width pattern

This means that you'll need to work around it, the simplest solution being very similar to what you're doing now: 这意味着你需要解决它,最简单的解决方案与你现在正在做的非常相似:

>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
'fooquuxbaz'
>>>
>>> # If you need to turn this into a callable function:
>>> def replace(start, replace, end, replacement, search):
        return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)

This doesn't have the elegance of the lookbehind solution, but it's still a very clear, straightforward one-liner. 这不具备外观解决方案的优雅,但它仍然是一个非常清晰,直接的单线程。 And if you look at what an expert has to say on the matter (he's talking about JavaScript, which lacks lookbehinds entirely, but many of the principles are the same), you'll see that his simplest solution looks a lot like this one. 如果你看看专家在这件事上有什么话要说 (他说的是JavaScript,完全缺乏外观,但许多原则是相同的),你会发现他最简单的解决方案看起来很像这个。

I believe that the best idea is just to capture in a group whatever you want to replace, and then replace it by using the start and end properties of the captured group. 我相信最好的想法就是在组中捕获任何您想要替换的内容,然后使用捕获的组的开始和结束属性替换它。

regards 问候

Adrián 阿德里安

#the pattern will contain the expression we want to replace as the first group
pat = "word1\s(.*)\sword2"   
test = "word1 will never be a word2"
repl = "replace"

import re
m = re.search(pat,test)

if m and m.groups() > 0:
    line = test[:m.start(1)] + repl + test[m.end(1):]
    print line
else:
    print "the pattern didn't capture any text"

This will print: 'word1 will never be a word2' 这将打印:'word1永远不会是word2'

The group to be replaced could be located in any position of the string. 要替换的组可以位于字符串的任何位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM