简体   繁体   English

句子形成:Java中的标点检查

[英]Sentence formation: Punctuation checks in java

I want to check the quality of sentence formation. 我想检查句子结构的质量。 Specifically, I am looking to see if the end-user types a space after a punctuation. 具体来说,我想看看最终用户是否在标点符号后输入空格。 I am okay with a NLP library, or a simple java regex solution too. 我可以使用NLP库或简单的Java regex解决方案。

For example: 例如:

  1. "Hi, my name is Tom Cruise. I like movies" “嗨,我叫汤姆·克鲁斯。我喜欢看电影”
  2. "Hi,my name is Tom Cruise. I like movies" “嗨,我叫汤姆·克鲁斯。我喜欢看电影”
  3. "Hi,my name is Tom Cruise.I like movies" “嗨,我叫汤姆·克鲁斯。我喜欢看电影”

Sentence 1 is perfect, sentence 2 is bad since it has 1 punctuation without a space after it, and sentence 3 is the worst since none of the punctuations are succeeded with a space. 句子1是完美的,句子2是较差的,因为它后面有1个标点,后面没有空格,而句子3是最差的,因为所有标点都没有空格。

Can you please suggest a java approach to this? 您能建议使用Java方法吗? I tried the languagetool API but didn't work. 我尝试了languagetool API,但是没有用。

Why don't you try Patterns and Unicode categories? 您为什么不尝试使用模式和Unicode类别?

For instance: 例如:

Pattern pattern = Pattern.compile("\\p{P} ");
        Matcher matcher = pattern.matcher("Hi, my name is Tom Cruise. I like movies");
        while (matcher.find()) {
            System.out.println(matcher.group());
        }

The Pattern here searches for any punctuation followed by a space. 此处的模式搜索任何标点符号,后跟一个空格。 The output will be: 输出将是:

, 
. 

(notice the space after the comma and the dot) (注意逗号和点后的空格)

You could probably refine your Pattern by specifying which exact punctuation characters are eligible to be followed by a space. 您可以通过指定哪些确切的标点符号可以在其后跟一个空格来完善您的模式。

Finally, in order to check for the opposite (a punctuation character not followed by whitespace): 最后,为了检查相反的内容(标点字符后没有空格):

Pattern otherPattern = Pattern.compile("\\p{P}\\S");
Pattern pattern = Pattern.compile("\\p{P}\\S");

String[] tests = new String[] {
    "Hi, my name is Tom Cruise. I like movies",
    "Hi,my name is Tom Cruise. I like movies",
    "Hi,my name is Tom Cruise.I like movies"
};

int[] results = new int[] { 0, 0, 0 };

for (int i = 0; i < tests.length; i++) {
    Matcher matcher = pattern.matcher(tests[i]);
    while(matcher.find()) {
        results[i] += 1;
    }
    if (results[i] == 0) {
        System.out.println("Sentence " + (i + 1) + " is perfect");
    } else if (results[i] > 1 && results[i] < 3) {
        System.out.println("Sentence " + (i + 1) + " is good");
    } else {
        System.out.println("Sentence " + (i + 1) + " is bad");
    }
}
// now you know how many violations there were on every line.
// do whatever you want with them.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM