简体   繁体   中英

Looking for a regex pattern for capturing phrases until dot

I have a long text like this:

text = 'Quisiera yo detectar los puntos... pero solo los puntos aislados. Los puntos suspensivos no los quiero detectar. A eso me refiero.'

and I want to get this output:

phrases = ['Quisiera yo detectar los puntos... pero solo los puntos aislados.',
' Los puntos suspensivos no los quiero detectar.',
' A eso me refiero.']

The problem are the three dots in the first phrase. I can't find a regex which discrimines them from the common one-dot separator. Is there a way to achieve it with regex?

You want to handle the .. (or ... , etc.) differently and combine it with a negative lookahead:

(?:[^.]|\.{2,})+\.

Explanation:

  • (?:[^.]|\\.{2,})+ will match any string that consists of non- . characters or groups of 2 or more . s
  • \\. requires a period, of course

Here's a demo .

You can use a positive lookbehind to only split on whitespace not preceeded by more than one dot. This approach would ignore any sequence of 2 or more dots.

For example:

import re

s = 'Quisiera yo detectar los puntos... pero solo los puntos aislados. Los puntos suspensivos no los quiero detectar. A eso me refiero.'

sentences = re.split(r'(?<=[^.]\.)\s', s)
print(sentences)
# ['Quisiera yo detectar los puntos... pero solo los puntos aislados.', 'Los puntos suspensivos no los quiero detectar.', 'A eso me refiero.']

Try this...

import re

text = 'Quisiera yo detectar los puntos... pero solo los puntos aislados. Los puntos suspensivos no los quiero detectar. A eso me refiero.'

pattern = r"(?<=\.)\s(?=[A-Z])"
re.split(pattern, text)

The result should be...

['Quisiera yo detectar los puntos... pero solo los puntos aislados.',
 'Los puntos suspensivos no los quiero detectar.',
 'A eso me refiero.']

My answer is based on this SO answer .

Update:
Looking through some of the answers using the regex tag I came across this metadiscussion as well as this answer . My answer did not come from an innate knowledge of regular expressions but rather from spending about 17 minutes googling different search terms and poking around Stack Overflow. In the intervening 17 minutes or so it took me to craft my answer the other two answers showed up.
I realized that my answer was more the "show me the code" rather than "teach a man to fish" sort of answer. Bottom lining my sentiments I would say that when I'm in acute need of help I want someone to show me the code. But being able to google for solutions to problems is an important skill but also a terrible drug. Hopefully my solution helped but I would also strongly recommend checking out the links in my update. If anything for the perspective as to the state of the regex tag and about making stack overflow more meaningful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM