简体   繁体   English

python搜索/替换类似sed的正则表达式

[英]python search/replace regex with sed-like expression

I'd like to implement a sed-like search-and-replace in Python. 我想在Python中实现类似sed的搜索和替换。

Now obviously, Python has the re module: 现在显然,Python具有re模块:

import re
re.sub("([A-Z]+)", r"\1-\1", "123 ABC 456")

However, I would like to specify the search/replace operation in a single string, like in sed (leaving aside any escaping issues for now): 但是,我想在单个字符串中指定搜索/替换操作,例如在sed中(暂时不留任何转义问题):

s/([A-Z]+)/\1-\1/g

The reason, I prefer this syntax, is because the actual search&replacement specification is supplied by the user, and I think it is simpler for the user to specify a single search/replace string, rather than both a pattern and a template . 我之所以喜欢这种语法,是因为实际的搜索和替换规范由用户提供,并且我认为对于用户而言,指定单个搜索/替换字符串而不是模式模板都更为简单。

Update 更新资料

I'm only interested in sed's s (search/replace) command, for single lines (no special extensions). 对sed的s (搜索/替换)命令感兴趣,因为它用于单行(无特殊扩展名)。 The use-case is really to allow users to provide a string-transformation (with groups) for hostnames. 用例实际上是允许用户为主机名提供字符串转换(带有组)。

Any ideas? 有任何想法吗?

My first thoughts were just to split it by / and pass it as args to re.sub . 我最初的想法只是将它分割为/并将其作为args传递给re.sub

Turns out this is rather complicated and as I'm pretty sure its not bulletproof, so I give you this as a starting point. 事实证明这是相当复杂的,并且我敢肯定它不是防弹的,所以我以此为起点。

Thing is, what if we want to deal with slashes, as in replace slashes with backslashes. 问题是,如果我们要处理斜杠,例如用反斜杠替换斜杠,该怎么办。 Then the sed expression will be 然后sed表达式将是

's/\\/\//g'

I have to split it by slash that is not preceded by backlash 我必须用没有反斜杠的斜杠来分割它

_, pattern, repl, options = re.split(r'(?<!\\)/', sed)

To make it more complicated, the shash can be preceded by two backslashes, so: 为了使其更加复杂,可以在shash之前加上两个反斜杠,因此:

_, pattern, repl, options = re.split(r'(?<![^\\]\\)/', sed)

And re.sub will look like re.sub看起来像

re.sub(pattern, repl, s, count='g' not in options)

Ups, no, in Python, slash doesn't have to be escaped, so: Ups,不,在Python中,不必转义斜杠,因此:

re.sub(pattern, re.sub(r'\\/', '/', repl), s, count='g' not in options)

>>> import re
>>> s = r'\some\windows\path'
>>> sed = r's/\\/\//g'
>>> _, pattern, repl, options = re.split(r'(?<![^\\]\\)/', sed)
>>> re.sub(pattern, re.sub(r'\\/', '/', repl), s, count='g' not in options)
'/some/windows/path'

Python's re just doesn't support this syntax. Python的re不支持此语法。 If you want to have such a tool, you'll need to develop your own API, so has to parse a sed-like command and to execute the corresponding re function. 如果要使用这样的工具,则需要开发自己的API,因此必须解析类似sed的命令并执行相应的re函数。

You could write a function that, given a sed-like s/ command, parses it, and returns the corresponding re function. 您可以编写一个函数,给定类似于sed的s/命令,对其进行解析,然后返回相应的re函数。 This returned function could then be used on whichever string. 然后可以在任何字符串上使用此返回的函数。

def run_sed_sub(command):
    regex = re.compile(r"(?!=\\)/")    # split on unescaped slashes
    parts = regex.split(command)
    if parts[0] != 's':
        raise ValueError("Not a sub command")

    regex = re.compile(parts[1])
    return lambda s: regex.sub(parts[2], s)

>>> func = run_sed_sub(r"s/Hello/Goodbye/g")
>>> print(func("Hello, world!"))
Goodbye, world!

>>> func = run_sed_sub(r"s/([A-Z]+)/\1-\1/g")
>>> print(func("123 ABC 456"))
123 ABC-ABC 456

There are some edgy cases that would probably be painful to handle, such as linebreaks, but the idea is here. 有些前卫的情况可能会很痛苦,例如换行,但想法就在这里。 You might also want to replace the slashes that were escaped sed-wise with normal slashes, so parts = [re.sub(r"\\\\/", "/", p) for p in parts] should do the trick. 您可能还想用普通斜杠替换以sed方式转义的斜杠,因此, parts = [re.sub(r"\\\\/", "/", p) for p in parts]应该可以解决问题。

I'm not sure either how you would exactly handle the flags at the end, but I suppose it's not really difficult if you know what behaviours you're expecting. 我也不知道最后如何处理这些标志,但是我想如果知道期望的行为并不难。

I would add nevertheless that the boilerplate of implementing such a tool is probably much greater than just learning Python's re . 尽管如此,我还要补充一点,实现这种工具的样板可能比仅仅学习Python的re还要重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM