Java Regex：检查句子是否仅包含字母和数字

Question

My following program prints weird results that I don't understand, I guess it is due to my lack of great understanding on Java Regex. 我的以下程序打印出我不了解的奇怪结果，我猜这是由于我对Java Regex缺乏深入的了解。 So I wish to split the testStr by period first, then check if each sentence contains alphabet or numbers. 因此，我希望testStr句点分隔testStr ，然后检查每个句子是否包含字母或数字。 But surprisingly, I got the following output, which is opposite to my wish: 但是令人惊讶的是，我得到了以下输出，这与我的愿望相反：

blah blah1 is not a character!
 blah blah2 is not a character!
 blah blah3 is not a character!
 ??** is not a character!     // only this output is expected

my code below: 我的代码如下：

String testStr = "blah blah1. blah blah2. blah blah3. ??**...";
String[] myStrArray = testStr.split("[.]");

System.out.println("length of myStrArray is: " + myStrArray.length);

for (String str : myStrArray) {
    if (!Pattern.matches("\\w+", str)) {
        System.out.println(str + " is not a character!");
        continue;
    }

    System.out.println("got a meaningful sentence " + str.trim());

}

Answer 1

Your program splits string using dot as a separator, so you get: 您的程序使用点作为分隔符分割字符串，因此您得到：

blah blah1 
blah blah2 
blah blah3 
??**...

Then you try to match each line using regex \\w+ . 然后，您尝试使用regex \\w+ 匹配每一行。 Please note that call of match() is equivalent to call of find() but with regex that includes ^ and $ , ie think that your regex is ^\\w+$ . 请注意， match()调用等效于find()调用，但是正则表达式包含^和$ ，即认为您的正则表达式为^\\w+$ 。

I think that now it is obvious that any one of your strings does not match this pattern because 3 first strings contain space and the last does not contain neither alphabet characters nor digits. 我认为现在很明显您的任何一个字符串都不匹配此模式，因为前三个字符串包含空格，而最后一个字符串既不包含字母字符也不包含数字。

Answer 2

Change your regex to: ^[a-zA-Z0-9\\s]+$ it'll allow only characters, numbers and spaces as required. 将您的正则表达式更改为： ^[a-zA-Z0-9\\s]+$它将仅允许使用字符，数字和空格。 pay attention that part of the "magic" is the use of ^ and $ which force a full match (from beginning to end). 请注意，“魔术”的一部分是使用^和$强制完全匹配（从开始到结束）。

Further, the reason I've used a-zA-Z0-9 instead of \\w is that \\w includes _ which doesn't fit the requirements. 此外，我使用a-zA-Z0-9代替\\w是\\w包含_ ，这不符合要求。

Answer 3

You can use a character set. 您可以使用字符集。 Change the regex you are using ( "\\\\w+" ) to this: 将您正在使用的正则表达式（ "\\\\w+" ）更改为此：

"[\\s&&[^\\W_]]"

This will allow alphanumerals ( [^\\W_] => a-zA-Z0-9 ) and whitespaces ( \\s ) to be matched, instead of only word characters. 这将允许字母数字（ [^\\W_] => a-zA-Z0-9 ）和空格（ \\s ）匹配，而不仅仅是单词字符。

Java Regex：检查句子是否仅包含字母和数字

问题描述

3 个解决方案

解决方案1
3 已采纳 2014-09-21 06:30:23

解决方案2
2 2014-09-21 06:30:00

解决方案3
0 2014-09-21 12:51:41

Java Regex：检查句子是否仅包含字母和数字

问题描述

3 个解决方案

解决方案1 3 已采纳 2014-09-21 06:30:23

解决方案2 2 2014-09-21 06:30:00

解决方案3 0 2014-09-21 12:51:41

解决方案1
3 已采纳 2014-09-21 06:30:23

解决方案2
2 2014-09-21 06:30:00

解决方案3
0 2014-09-21 12:51:41