# coding=utf-8
import re
m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."
keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']
obj = re.compile(r'\b(?:{})\b\s*(.*?),'.format('|'.join(map(re.escape, keywords))))
print obj.findall(m)
I want to print text between one of words of keywords and the next point. Output that I want in these case: "esta es una de, las palabras."
the trailing \\b
prevents the match because your keyword ends with :
simplify your regex by removing it. Plus the greedy / comma (.*?),
is only extracting the first part before comma, I suppose you meant "to the next point": (.*?)\\.
obj = re.compile(r'\b(?:{})\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))
result:
['esta es una de, las palabras']
Removing the word boundary can match part of keywords in sentences though. You could force a non-word char with \\W
afterwards and it would work (acting like word boundary):
obj = re.compile(r'\b(?:{})\W\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))
Use \\b(?:{0})\\s*(.*?)(?=\\b(?:{0})|$)
with lookahead instead:
import re
m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."
keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']
obj = re.compile(r'\b(?:{0})\s*(.*?)(?=\b(?:{0})|$)'.format('|'.join(map(re.escape, keywords))))
print(obj.findall(m))
This outputs:
['esta es una de, las palabras.']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.