Search in a string and obtain the 2 words before and after the match in Python

Question

I'm using Python to search some words (also multi-token) in a description (string).

To do that I'm using a regex like this

    result = re.search(word, description, re.IGNORECASE)
    if(result):
        print ("Trovato: "+result.group())

But what I need is to obtain the first 2 word before and after the match. For example if I have something like this:

Parking here is horrible, this shop sucks.

" here is " is the word that I looking for. So after I matched it with my regex I need the 2 words (if exists) before and after the match.

In the example: Parking here is horrible, this

"Parking" and horrible, this are the words that I need.

ATTTENTION The description cab be very long and the pattern "here is" can appear multiple times?

Answer 1

How about string operations?

line = 'Parking here is horrible, this shop sucks.'

before, term, after = line.partition('here is')
before = before.rsplit(maxsplit=2)[-2:]
after = after.split(maxsplit=2)[:2]

Result:

>>> before
['Parking']
>>> after
['horrible,', 'this']

Answer 2

Try this regex: ((?:[az,]+\\s+){0,2})here is\\s+((?:[az,]+\\s*){0,2})

with re.findall and re.IGNORECASE set

Demo

Answer 3

Based on your clarification, this becomes a bit more complicated. The solution below deals with scenarios where the searched pattern may in fact also be in the two preceding or two subsequent words.

line = "Parking here is horrible, here is great here is mediocre here is here is "
print line
pattern = "here is"
r = re.search(pattern, line, re.IGNORECASE)
output = []
if r:
    while line:
        before, match, line = line.partition(pattern)
        if match:
            if not output:
                before = before.split()[-2:]
            else:    
                before = ' '.join([pattern, before]).split()[-2:]
            after = line.split()[:2]
            output.append((before, after))
print output

[(['Parking'], ['horrible,', 'here']), (['is', 'horrible,'], ['great', 'here']), (['is', 'great'], ['mediocre', 'here']), (['is', 'mediocre'], ['here', 'is']), (['here', 'is'], [])]

Answer 4

I would do it like this ( edit: added anchors to cover most cases ):

(\S+\s+|^)(\S+\s+|)here is(\s+\S+|)(\s+\S+|$)

Like this you will always have 4 groups (might have to be trimmed) with the following behavior:

If group 1 is empty, there was no word before (group 2 is empty too)
If group 2 is empty, there was only one word before (group 1)
If group 1 and 2 are not empty, they are the words before in order
If group 3 is empty, there was no word after
If group 4 is empty, there was only one word after
If group 3 and 4 are not empty, they are the words after in order

Corrected demo link

Search in a string and obtain the 2 words before and after the match in Python

Question

4 answers

solution1
2 2015-07-30 01:11:25

solution2
1 2015-07-30 03:08:17

solution3
0 2015-07-30 03:30:43

solution4
0 ACCPTED 2015-07-30 04:09:14

Search in a string and obtain the 2 words before and after the match in Python

Question

4 answers

solution1 2 2015-07-30 01:11:25

solution2 1 2015-07-30 03:08:17

solution3 0 2015-07-30 03:30:43

solution4 0 ACCPTED 2015-07-30 04:09:14

solution1
2 2015-07-30 01:11:25

solution2
1 2015-07-30 03:08:17

solution3
0 2015-07-30 03:30:43

solution4
0 ACCPTED 2015-07-30 04:09:14