简体   繁体   中英

How to use regex to delete a specific section of text after a regex match

(No, Python regex, how to delete all matches from a string doesn't solve my problem)

Suppose I have this list:

names = ['your name', 'the name', 'his name', 'her name', 'their name', 'employer name', "employer's name", "father's name",
        "mother's name", "maiden name", "son's name", "daughter's name", "brother's name", "sister's name"]

And suppose I have this piece of text:

text = "What is your name?  Well,  uh it's John Smith.  Thanks for asking. Anyway, I'd doing well."

How do I use regex to find every element of the list names in the text, and replace the block of text (say, of length 50) immediately after the element with " [name] ". So my output would be:

text = "What is your name [name] Anyway, I'd doing well."

So far, I have this code below, but it only replaces the element with " [name] " not the actual text after the element.

def my_replace3(match):
    match = match.group()
    return " [name] "

def no_name(text):
    names = ['your name', 'the name', 'his name', 'her name', 'their name', 'employer name', "employer's name", "father's name",
        "mother's name", "maiden name", "son's name", "daughter's name", "brother's name", "sister's name"]
    regex = re.compile(r'\b(' + '|'.join(names) + r')\b', re.IGNORECASE)
    text = re.sub(regex, my_replace3, text)
    return text

I'm not a great regex expert, so your help would be greatly appreciated.

If you want to replace 50 characters after the match, add .{50} to the regexp.

Then use a back-reference in the replacement string to copy the matched string to the replacement.

def no_name(text):
    names = ['your name', 'the name', 'his name', 'her name', 'their name', 'employer name', "employer's name", "father's name",
        "mother's name", "maiden name", "son's name", "daughter's name", "brother's name", "sister's name"]
    regex = re.compile(r'\b(' + '|'.join(map(re.escape, names)) + r')\b.{50}', re.IGNORECASE)
    text = re.sub(regex, r'\1 [name]', text)
    return text

You should also use re.escape() when inserting strings that should be matched exactly into the regexp, in case any of them contain regexp operators.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM