简体   繁体   中英

Matching sentences with regex in Java

I'm using the Scanner class in java to go through aa text file and extract each sentence. I'm using the setDelimiter method on my Scanner to the regex:

Pattern.compile("[\\w]*[\\.|?|!][\\s]")

This currently seems to work, but it leaves the whitespace at the end of the sentence. Is there an easy way to match the whitespace at the end but not include it in the result?

I realize this is probably an easy question but I've never used regex before so go easy :)

Try this:

"(?<=[.!?])\\s+"

This uses lookarounds to match \\\\s+ preceded by [.!?] .


If you want to remove the punctuations as well, then just include it as part of the match:

"[.!?]+\\s+"

This will split "ORLY!?!? LOL" into "ORLY" and "LOL"

What you're looking for is a positive lookahead. This should do it:

Pattern.compile("\\w*[.?!](?=\\s)")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM