简体   繁体   English

在Java中使用正则表达式匹配句子

[英]Matching sentences with regex in Java

I'm using the Scanner class in java to go through aa text file and extract each sentence. 我正在使用Java中的Scanner类来遍历一个文本文件并提取每个句子。 I'm using the setDelimiter method on my Scanner to the regex: 我在正则表达式的Scanner上使用setDelimiter方法:

Pattern.compile("[\\w]*[\\.|?|!][\\s]")

This currently seems to work, but it leaves the whitespace at the end of the sentence. 目前看来这是可行的,但它在句子的末尾留有空白。 Is there an easy way to match the whitespace at the end but not include it in the result? 是否有一种简单的方法可以在末尾匹配空白但不将其包括在结果中?

I realize this is probably an easy question but I've never used regex before so go easy :) 我意识到这可能是一个简单的问题,但我之前从未使用过正则表达式,所以请轻松:)

Try this: 尝试这个:

"(?<=[.!?])\\s+"

This uses lookarounds to match \\\\s+ preceded by [.!?] . 这将使用环视条件来匹配\\\\s+[.!?] 开头


If you want to remove the punctuations as well, then just include it as part of the match: 如果您也要删除标点符号,则将其作为匹配项的一部分:

"[.!?]+\\s+"

This will split "ORLY!?!? LOL" into "ORLY" and "LOL" 这会将"ORLY!?!? LOL"分为"ORLY""LOL"

What you're looking for is a positive lookahead. 您正在寻找一个积极的前瞻。 This should do it: 应该这样做:

Pattern.compile("\\w*[.?!](?=\\s)")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM