简体   繁体   中英

python search/replace regex with sed-like expression

I'd like to implement a sed-like search-and-replace in Python.

Now obviously, Python has the re module:

import re
re.sub("([A-Z]+)", r"\1-\1", "123 ABC 456")

However, I would like to specify the search/replace operation in a single string, like in sed (leaving aside any escaping issues for now):

s/([A-Z]+)/\1-\1/g

The reason, I prefer this syntax, is because the actual search&replacement specification is supplied by the user, and I think it is simpler for the user to specify a single search/replace string, rather than both a pattern and a template .

Update

I'm only interested in sed's s (search/replace) command, for single lines (no special extensions). The use-case is really to allow users to provide a string-transformation (with groups) for hostnames.

Any ideas?

My first thoughts were just to split it by / and pass it as args to re.sub .

Turns out this is rather complicated and as I'm pretty sure its not bulletproof, so I give you this as a starting point.

Thing is, what if we want to deal with slashes, as in replace slashes with backslashes. Then the sed expression will be

's/\\/\//g'

I have to split it by slash that is not preceded by backlash

_, pattern, repl, options = re.split(r'(?<!\\)/', sed)

To make it more complicated, the shash can be preceded by two backslashes, so:

_, pattern, repl, options = re.split(r'(?<![^\\]\\)/', sed)

And re.sub will look like

re.sub(pattern, repl, s, count='g' not in options)

Ups, no, in Python, slash doesn't have to be escaped, so:

re.sub(pattern, re.sub(r'\\/', '/', repl), s, count='g' not in options)

>>> import re
>>> s = r'\some\windows\path'
>>> sed = r's/\\/\//g'
>>> _, pattern, repl, options = re.split(r'(?<![^\\]\\)/', sed)
>>> re.sub(pattern, re.sub(r'\\/', '/', repl), s, count='g' not in options)
'/some/windows/path'

Python's re just doesn't support this syntax. If you want to have such a tool, you'll need to develop your own API, so has to parse a sed-like command and to execute the corresponding re function.

You could write a function that, given a sed-like s/ command, parses it, and returns the corresponding re function. This returned function could then be used on whichever string.

def run_sed_sub(command):
    regex = re.compile(r"(?!=\\)/")    # split on unescaped slashes
    parts = regex.split(command)
    if parts[0] != 's':
        raise ValueError("Not a sub command")

    regex = re.compile(parts[1])
    return lambda s: regex.sub(parts[2], s)

>>> func = run_sed_sub(r"s/Hello/Goodbye/g")
>>> print(func("Hello, world!"))
Goodbye, world!

>>> func = run_sed_sub(r"s/([A-Z]+)/\1-\1/g")
>>> print(func("123 ABC 456"))
123 ABC-ABC 456

There are some edgy cases that would probably be painful to handle, such as linebreaks, but the idea is here. You might also want to replace the slashes that were escaped sed-wise with normal slashes, so parts = [re.sub(r"\\\\/", "/", p) for p in parts] should do the trick.

I'm not sure either how you would exactly handle the flags at the end, but I suppose it's not really difficult if you know what behaviours you're expecting.

I would add nevertheless that the boilerplate of implementing such a tool is probably much greater than just learning Python's re .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM