Find text between list of keywords and point with RegEx in Python

Question

# coding=utf-8
import re

m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."

keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']

obj = re.compile(r'\b(?:{})\b\s*(.*?),'.format('|'.join(map(re.escape, keywords))))
print obj.findall(m)

I want to print text between one of words of keywords and the next point. Output that I want in these case: "esta es una de, las palabras."

Answer 1

the trailing \\b prevents the match because your keyword ends with :

simplify your regex by removing it. Plus the greedy / comma (.*?), is only extracting the first part before comma, I suppose you meant "to the next point": (.*?)\\.

obj = re.compile(r'\b(?:{})\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))

result:

['esta es una de, las palabras']

Removing the word boundary can match part of keywords in sentences though. You could force a non-word char with \\W afterwards and it would work (acting like word boundary):

obj = re.compile(r'\b(?:{})\W\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))

Answer 2

Use \\b(?:{0})\\s*(.*?)(?=\\b(?:{0})|$) with lookahead instead:

import re
m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."
keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']
obj = re.compile(r'\b(?:{0})\s*(.*?)(?=\b(?:{0})|$)'.format('|'.join(map(re.escape, keywords))))
print(obj.findall(m))

This outputs:

['esta es una de, las palabras.']

Find text between list of keywords and point with RegEx in Python

Question

2 answers

solution1
2 ACCPTED 2018-07-25 08:16:29

solution2
1 2018-07-25 08:16:59

Find text between list of keywords and point with RegEx in Python

Question

2 answers

solution1 2 ACCPTED 2018-07-25 08:16:29

solution2 1 2018-07-25 08:16:59

solution1
2 ACCPTED 2018-07-25 08:16:29

solution2
1 2018-07-25 08:16:59