简体   繁体   中英

Python Search word contains : in a string

I try to research if a word exists in a string or not. the problem that the search word contains the character ':' . the search was not successful even if I used the escape. In the example the search for the word 'decision:' return does not exist while the word does exist in the sentence.

Knowing that the search must be exact example: I search the word 'for' it must return me not exist when the sentence contains the word 'formatted' .

import re
texte ="  hello \n a formated test text   \n decision :   repair \n toto \n titi"
word_list = ['decision :', 'for']
def verif_exist (word_list, paragraph):
   
    exist = False
    for word in word_list:
        exp = re.escape(word)
      
        print(exp)
        if re.search(r"\b%s\b" % exp, paragraph, re.IGNORECASE):
            print("From exist, word detected: " + word)
            exist = True
        if exist == True:
            break
    return exist
if verif_exist(word_list, texte):
    print("exist")
else:
    print("not exist") ```

Only needed change is removing the second \b word boundary you wrap the escaped pattern with. Instead, we positive lookahead to ensure there is a space or end of string after the word. Finally, we capture only the word.

import re
texte ="  hello \n a formated test text   \n decision :   repair \n toto \n titi"
word_list = ['decision :', 'for']
def verif_exist (word_list, paragraph):
    for word in word_list:
        exp = re.escape(word)
      
        print(exp)
        if re.search(r"\b(%s)(?=\s|$)" % exp, paragraph, re.IGNORECASE): # remove second word boundary, as we want to match non word characters after the word (space and colon)
            print("From exist, word detected: " + word)
            return True

    return False
if verif_exist(word_list, texte):
    print("exist")
else:
    print("not exist")

The documentation states: "\b matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters.". There is no word boundary between: and a space because both are not part of a sequence of word characters.

Maybe you can use either a word boundary or a whitespace in your regular expression.

import re

texte = "  hello \n a formated test text   \n decision :   repair \n toto \n titi"
word_list = ['decision :', 'for']


def verif_exist(word_list, paragraph):
    for word in word_list:
        exp = re.escape(word)
        print(exp)
        if re.search(fr"\b{exp}(\b|\s)", paragraph, re.IGNORECASE):
            print("From exist, word detected: " + word)
            return True
    return False


if verif_exist(word_list, texte):
    print("exist")
else:
    print("not exist")

That's still not perfect. You might want to take into account what happens if your text ist just 'decision:' . Here we don't have a word boundary and we don't have a whitespace. We'll have to add a check for the end of the text giving us:

    if re.search(fr"\b{exp}(\b|\s|$)", paragraph, re.IGNORECASE):

And now you might have to do something similar to the word boundary at the beginning of your regular expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM