简体   繁体   English

Java Regex:检查句子是否仅包含字母和数字

[英]Java Regex: check if a sentence contains only alphabet and numbers

My following program prints weird results that I don't understand, I guess it is due to my lack of great understanding on Java Regex. 我的以下程序打印出我不了解的奇怪结果,我猜这是由于我对Java Regex缺乏深入的了解。 So I wish to split the testStr by period first, then check if each sentence contains alphabet or numbers. 因此,我希望testStr句点分隔testStr ,然后检查每个句子是否包含字母或数字。 But surprisingly, I got the following output, which is opposite to my wish: 但是令人惊讶的是,我得到了以下输出,这与我的愿望相反:

blah blah1 is not a character!
 blah blah2 is not a character!
 blah blah3 is not a character!
 ??** is not a character!     // only this output is expected

my code below: 我的代码如下:

String testStr = "blah blah1. blah blah2. blah blah3. ??**...";
String[] myStrArray = testStr.split("[.]");

System.out.println("length of myStrArray is: " + myStrArray.length);

for (String str : myStrArray) {
    if (!Pattern.matches("\\w+", str)) {
        System.out.println(str + " is not a character!");
        continue;
    }

    System.out.println("got a meaningful sentence " + str.trim());

}

Your program splits string using dot as a separator, so you get: 您的程序使用点作为分隔符分割字符串,因此您得到:

blah blah1 
blah blah2 
blah blah3 
??**...

Then you try to match each line using regex \\w+ . 然后,您尝试使用regex \\w+ 匹配每一行。 Please note that call of match() is equivalent to call of find() but with regex that includes ^ and $ , ie think that your regex is ^\\w+$ . 请注意, match()调用等效于find()调用,但是正则表达式包含^$ ,即认为您的正则表达式为^\\w+$

I think that now it is obvious that any one of your strings does not match this pattern because 3 first strings contain space and the last does not contain neither alphabet characters nor digits. 我认为现在很明显您的任何一个字符串都不匹配此模式,因为前三个字符串包含空格,而最后一个字符串既不包含字母字符也不包含数字。

Change your regex to: ^[a-zA-Z0-9\\s]+$ it'll allow only characters, numbers and spaces as required. 将您的正则表达式更改为: ^[a-zA-Z0-9\\s]+$它将仅允许使用字符,数字和空格。 pay attention that part of the "magic" is the use of ^ and $ which force a full match (from beginning to end). 请注意,“魔术”的一部分是使用^$强制完全匹配(从开始到结束)。

Further, the reason I've used a-zA-Z0-9 instead of \\w is that \\w includes _ which doesn't fit the requirements. 此外,我使用a-zA-Z0-9代替\\w\\w包含_ ,这不符合要求。

You can use a character set. 您可以使用字符集。 Change the regex you are using ( "\\\\w+" ) to this: 将您正在使用的正则表达式( "\\\\w+" )更改为此:

"[\\s&&[^\\W_]]"

This will allow alphanumerals ( [^\\W_] => a-zA-Z0-9 ) and whitespaces ( \\s ) to be matched, instead of only word characters. 这将允许字母数字( [^\\W_] => a-zA-Z0-9 )和空格( \\s )匹配,而不仅仅是单词字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM