简体   繁体   中英

Find text between list of keywords and point with RegEx in Python

# coding=utf-8
import re

m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."

keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']

obj = re.compile(r'\b(?:{})\b\s*(.*?),'.format('|'.join(map(re.escape, keywords))))
print obj.findall(m)

I want to print text between one of words of keywords and the next point. Output that I want in these case: "esta es una de, las palabras."

the trailing \\b prevents the match because your keyword ends with :

simplify your regex by removing it. Plus the greedy / comma (.*?), is only extracting the first part before comma, I suppose you meant "to the next point": (.*?)\\.

obj = re.compile(r'\b(?:{})\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))

result:

['esta es una de, las palabras']

Removing the word boundary can match part of keywords in sentences though. You could force a non-word char with \\W afterwards and it would work (acting like word boundary):

obj = re.compile(r'\b(?:{})\W\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))

Use \\b(?:{0})\\s*(.*?)(?=\\b(?:{0})|$) with lookahead instead:

import re
m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."
keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']
obj = re.compile(r'\b(?:{0})\s*(.*?)(?=\b(?:{0})|$)'.format('|'.join(map(re.escape, keywords))))
print(obj.findall(m))

This outputs:

['esta es una de, las palabras.']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM