How to use regex to find a specific word from text and return all occurences?

Question

Just like the question title.

I'm new to Python and regular expressions. Hereby I have to search for a specific word from a paragraph and show all indices of occurrence.

For example:

the paragraph is:

This is a testing text and used to test and test and test.

and the word:

test

The algorithm should return the index of non-overlapping occurences of 3 words test in the above paragraph (but not testing , because I mean search the whole word, not just substring).

Another example with the same paragraph and this "word":

test and

The algorithm should return 2 occurrences of test and .

I guess I must use some regular expressions to find the pattern of that whole word, with preceding and following are punctuations such as . , ; ? - . , ; ? -

After Googling I found something like re.finditer should be used but it seems that I haven't found out the right way to go. Please help, thank you in advance. ;)

Answer 1

Yes, finditer is the way to go. Use start() to find the index of the match.

Example:

import re

a="This is a testing text and used to test and test and test."
print [m.start() for m in re.finditer(r"\btest\b", a)]
print [m.start() for m in re.finditer(r"\btest and\b", a)]

Output:

[35, 44, 53]
[35, 44]

Answer 2

Use word boundary anchor \\b in your regex to indicate you want match to start/end at word boundary.

>>> sentence = "This is a testing text and used to test and test and test."
>>> pattern = re.compile(r'\btest\b')
>>> [m.start() for m in pattern.finditer(sentence)]
[35, 44, 53]

How to use regex to find a specific word from text and return all occurences?

Question

2 answers

solution1
6 ACCPTED 2012-08-10 14:19:20

solution2
3 2012-08-10 14:13:08

How to use regex to find a specific word from text and return all occurences?

Question

2 answers

solution1 6 ACCPTED 2012-08-10 14:19:20

solution2 3 2012-08-10 14:13:08

solution1
6 ACCPTED 2012-08-10 14:19:20

solution2
3 2012-08-10 14:13:08