简体   繁体   中英

RegEx For Multiple Search & Replace

I'm trying to do a search and replace (for multiple chars) in the following string:

VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&

One or more of these characters: %3D, %2F, %2B, %23, can be found anywhere (beginning, middle, or end of the string) and ideally, I'd like to search for all of them at once (using one regex) and replace them with = or / or + or # respectively, then return the final string.

Example 1:

VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&

Should return

VAR=/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&

Example 2:

VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&

Should return

VAR=s2P0n6I/lonpj6uCKvYn8PCjp/4PUE2TPsltCdmA=RQPY=&

I'm not convinced you need regex for this, but it's fairly easy to do with Python:

x = 'VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&'

import re

MAPPING = { 
    '%3D': '=',
    '%2F': '/',
    '%2B': '+',
    '%23': '#',
}

def replace(match):
    return MAPPING[match.group(0)]

print x
print re.sub('%[A-Z0-9]{2}', replace, x)

Output:

VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&
VAR=/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&

There is no need for a regex to do that in your example. A simple replace method will do:

def rep(s):
    for pat, txt in [['%2F','/'], ['%2B','+'], ['%3D','='], ['%23','#']]:
        s = s.replace(pat, txt)
    return s

I'm also not convinced you need regex, but there's a better way to do url-decode with regex. Basically you need that every string in the pattern of %XX will be converted into the char it represents. This can be done with re.sub() like so:

>>> VAR="%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&"
>>> re.sub(r'%..', lambda x: chr(int(x.group()[1:], 16)), VAR)
'/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&'

Enjoy.

var = "VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&"
var = var.replace("%2F", "/")
var = var.replace("%2B", "+")
var = var.replace("%3D", "=")

but you got same result with urllib2.unquote

import urllib2
var = "VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&"
var = urllib2.unquote(var)

This can't be done with a regex because there's no way to write any kind of conditional inside of a regex. Regular expressions can only answer the question "Does this string match this pattern?" and not perform the operation "If this string matches this pattern, replace part of it with this. If it matches this pattern, replace it with this. etc..."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM