简体   繁体   中英

Making letters uppercase using re.sub in python?

In many programming languages, the following

find foo([az]+)bar and replace with GOO\\U\\1GAR

will result in the entire match being made uppercase. I can't seem to find the equivalent in python; does it exist?

You can pass a function to re.sub() that will allow you to do this, here is an example:

 def upper_repl(match):
     return 'GOO' + match.group(1).upper() + 'GAR'

And an example of using it:

 >>> re.sub(r'foo([a-z]+)bar', upper_repl, 'foobazbar')
 'GOOBAZGAR'

Do you mean something like this?

>>>x = "foo spam bar"
>>>re.sub(r'foo ([a-z]+) bar', lambda match: r'foo {} bar'.format(match.group(1).upper()), x)
'foo SPAM bar'

For reference, here's the docstring of re.sub (emphasis mine).

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable ; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.

If you already have a replacement string (template), you may not be keen on swapping it out with the verbosity of m.group(1)+...+m.group(2)+...+m.group(3) ... Sometimes it's nice to have a tidy little string.

You can use the MatchObject 's expand() function to evaluate a template for the match in the same manner as sub() , allowing you to retain as much of your original template as possible. You can use upper on the relevant pieces.

re.sub(r'foo([a-z]+)bar', lambda m: 'GOO' + m.expand('\1GAR').upper())

While this would not be particularly useful in the example above, and while it does not aid with complex circumstances, it may be more convenient for longer expressions with a greater number of captured groups, such as a MAC address censoring regex, where you just want to ensure the full replacement is capitalized or not.

You could use some variation of this:

s = 'foohellobar'
def replfunc(m):
     return m.groups()[0]+m.groups()[1].upper()+m.groups()[2]
re.sub('(foo)([a-z]+)(bar)',replfunc,s)

gives the output:

'fooHELLObar'

For those coming across this on google...

You can also use re.sub to match repeating patterns. For example, you can convert a string with spaces to camelCase:

def to_camelcase(string):
  string = string[0].lower() + string[1:]  # lowercase first
  return re.sub(
    r'[\s]+(?P<first>[a-z])',              # match spaces followed by \w
    lambda m: m.group('first').upper(),    # get following \w and upper()
    string) 

to_camelcase('String to convert')          # --> stringToConvert

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM