简体   繁体   中英

Substitute a part of string in python using regular expressions

What I want is :
original string : (#1 AND #12) OR #10
convert to : (something AND another_something) OR something_another

mean to say is, according to #number replace it by a unique string

What I did is :

filter_string = "(#1 AND #12) OR #10"
for fltr in filters_array:
        index = fltr[0] #numbers coming from here
        replace_by = fltr[1] #this string will replace original one
        filter_string = re.sub(r'#'+str(index),replace_by,filter_string)

Output :

(something AND something2) OR something0

The Problem : rather then replacing #1 it replaces #12 and #11 also because #12 also have #1.
I tried with count = 1 in re.sub() function but it did not worked because my string can be ' (#12 AND #1) ' as well.

Use word boundary \\\\b anchor to force an exact number matching:

filter_string = "(#1 AND #12) OR #10"
filters_array = [(1,"something"),(10,"something_another"),(12,"another_somet‌​hing")]
for num,s in filters_array:
    filter_string = re.sub(r'#'+ str(num) +'\\b', s, filter_string)

print(filter_string)

The output:

(something AND another_somet‌​hing) OR something_another

http://www.regular-expressions.info/wordboundaries.html

You may convert the list of tuples into a dictionary and use a re.sub with a pattern capturing the digit part and then a lambda expression in the replacement argument to find the right value by key:

import re
filter_string = "(#1 AND #12) OR #10"
filters_array = [(1,"something"),(10,"something_another"),(12,"another_something")]
dt = dict(filters_array)
filter_string = re.sub(r'#([0-9]+)', lambda x: dt[int(x.group(1))] if int(x.group(1)) in dt else x.group(), filter_string)
print(filter_string)
# => (something AND another_something) OR something_another

The #([0-9]+) pattern matches # and then matches and captures into Group 1 one or more digits. Then, inside the lambda, the numeric value is used to fetch the existing value. If it does not exist, the # + the number will be inserted back into the result.

See the Python demo .

If you need to further process the match, you may want to use a callback method rather than a lamda in the replacement argument:

import re

filters_array = [(1,"something"),(10,"something_another"),(12,"another_something")]
dt = dict(filters_array)

def repl(m):
    return dt[int(m.group(1))] if int(m.group(1)) in dt else m.group()

filter_string = re.sub(r'#([0-9]+)', repl, "(#1 AND #12) OR #10")
print(filter_string)

See another Python demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM