Python Search word contains : in a string

Question

I try to research if a word exists in a string or not. the problem that the search word contains the character ':' . the search was not successful even if I used the escape. In the example the search for the word 'decision:' return does not exist while the word does exist in the sentence.

Knowing that the search must be exact example: I search the word 'for' it must return me not exist when the sentence contains the word 'formatted' .

import re
texte ="  hello \n a formated test text   \n decision :   repair \n toto \n titi"
word_list = ['decision :', 'for']
def verif_exist (word_list, paragraph):
   
    exist = False
    for word in word_list:
        exp = re.escape(word)
      
        print(exp)
        if re.search(r"\b%s\b" % exp, paragraph, re.IGNORECASE):
            print("From exist, word detected: " + word)
            exist = True
        if exist == True:
            break
    return exist
if verif_exist(word_list, texte):
    print("exist")
else:
    print("not exist") ```

Answer 1

Only needed change is removing the second \b word boundary you wrap the escaped pattern with. Instead, we positive lookahead to ensure there is a space or end of string after the word. Finally, we capture only the word.

import re
texte ="  hello \n a formated test text   \n decision :   repair \n toto \n titi"
word_list = ['decision :', 'for']
def verif_exist (word_list, paragraph):
    for word in word_list:
        exp = re.escape(word)
      
        print(exp)
        if re.search(r"\b(%s)(?=\s|$)" % exp, paragraph, re.IGNORECASE): # remove second word boundary, as we want to match non word characters after the word (space and colon)
            print("From exist, word detected: " + word)
            return True

    return False
if verif_exist(word_list, texte):
    print("exist")
else:
    print("not exist")

Answer 2

The documentation states: "\b matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters.". There is no word boundary between: and a space because both are not part of a sequence of word characters.

Maybe you can use either a word boundary or a whitespace in your regular expression.

import re

texte = "  hello \n a formated test text   \n decision :   repair \n toto \n titi"
word_list = ['decision :', 'for']


def verif_exist(word_list, paragraph):
    for word in word_list:
        exp = re.escape(word)
        print(exp)
        if re.search(fr"\b{exp}(\b|\s)", paragraph, re.IGNORECASE):
            print("From exist, word detected: " + word)
            return True
    return False


if verif_exist(word_list, texte):
    print("exist")
else:
    print("not exist")

That's still not perfect. You might want to take into account what happens if your text ist just 'decision:' . Here we don't have a word boundary and we don't have a whitespace. We'll have to add a check for the end of the text giving us:

    if re.search(fr"\b{exp}(\b|\s|$)", paragraph, re.IGNORECASE):

And now you might have to do something similar to the word boundary at the beginning of your regular expression.

Python Search word contains : in a string

Question

2 answers

solution1
1 2021-06-17 14:47:11

solution2
0 ACCPTED 2021-06-17 14:54:42

Python Search word contains : in a string

Question

2 answers

solution1 1 2021-06-17 14:47:11

solution2 0 ACCPTED 2021-06-17 14:54:42

solution1
1 2021-06-17 14:47:11

solution2
0 ACCPTED 2021-06-17 14:54:42