简体   繁体   中英

Punctuation not detected between words with no space

How can I split sentences, when punctuation is detected (.?!) and occurs between two words without a space?

Example:

>>> splitText = re.split("(?<=[.?!])\s+", "This is an example. Not 
    working as expected.Because there isn't a space after dot.")  

output:

['This is an example.', 
"Not working as expected.Because there isn't a space after dot."] 

expected:

['This is an example.', 
'Not working as expected.', 
'Because there isn't a space after dot.']`
splitText = re.split("[.?!]\s*", "This is an example. Not working as expected.Because there isn't a space after dot.")

+ is used for 1 or more of something, * for zero of more.

if you need to keep the . you probably don't want to split, instead you could do:

splitText = re.findall(".*?[.?!]", "This is an example. Not working as expected.Because there isn't a space after dot.")

which gives

['This is an example.',
 ' Not working as expected.',
 "Because there isn't a space after dot."]

you can trim those by playing with the regex (eg '\\s*.*?[.?!]' ) or just using .trim()

Use https://regex101.com/r/icrJNl/3/ .

import re
from pprint import pprint

split_text = re.findall(".*?[?.!]", "This is an example! Working as "
                        "expected?Because.")

pprint(split_text)

Note: .*? is a lazy (or non-greedy) quantifier in opposite to .* which is a greedy quantifier.

Output:

['This is an example!', 
 ' Working as expected?', 
 'Because.']

Another solution:

import re
from pprint import pprint

split_text = re.split("([?.!])", "This is an example! Working as "
    "expected?Because.")

pprint(split_text)

Output:

['This is an example', 
'!', 
' Working as expected', 
'?', 
'Because', 
'.', 
'']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM