[英]Java Regex: check if a sentence contains only alphabet and numbers
My following program prints weird results that I don't understand, I guess it is due to my lack of great understanding on Java Regex. 我的以下程序打印出我不了解的奇怪结果,我猜这是由于我对Java Regex缺乏深入的了解。 So I wish to split the
testStr
by period first, then check if each sentence contains alphabet or numbers. 因此,我希望
testStr
句点分隔testStr
,然后检查每个句子是否包含字母或数字。 But surprisingly, I got the following output, which is opposite to my wish: 但是令人惊讶的是,我得到了以下输出,这与我的愿望相反:
blah blah1 is not a character!
blah blah2 is not a character!
blah blah3 is not a character!
??** is not a character! // only this output is expected
my code below: 我的代码如下:
String testStr = "blah blah1. blah blah2. blah blah3. ??**...";
String[] myStrArray = testStr.split("[.]");
System.out.println("length of myStrArray is: " + myStrArray.length);
for (String str : myStrArray) {
if (!Pattern.matches("\\w+", str)) {
System.out.println(str + " is not a character!");
continue;
}
System.out.println("got a meaningful sentence " + str.trim());
}
Your program splits string using dot as a separator, so you get: 您的程序使用点作为分隔符分割字符串,因此您得到:
blah blah1
blah blah2
blah blah3
??**...
Then you try to match each line using regex \\w+
. 然后,您尝试使用regex
\\w+
匹配每一行。 Please note that call of match()
is equivalent to call of find()
but with regex that includes ^
and $
, ie think that your regex is ^\\w+$
. 请注意,
match()
调用等效于find()
调用,但是正则表达式包含^
和$
,即认为您的正则表达式为^\\w+$
。
I think that now it is obvious that any one of your strings does not match this pattern because 3 first strings contain space and the last does not contain neither alphabet characters nor digits. 我认为现在很明显您的任何一个字符串都不匹配此模式,因为前三个字符串包含空格,而最后一个字符串既不包含字母字符也不包含数字。
Change your regex to: ^[a-zA-Z0-9\\s]+$
it'll allow only characters, numbers and spaces as required. 将您的正则表达式更改为:
^[a-zA-Z0-9\\s]+$
它将仅允许使用字符,数字和空格。 pay attention that part of the "magic" is the use of ^
and $
which force a full match (from beginning to end). 请注意,“魔术”的一部分是使用
^
和$
强制完全匹配(从开始到结束)。
Further, the reason I've used a-zA-Z0-9
instead of \\w
is that \\w
includes _
which doesn't fit the requirements. 此外,我使用
a-zA-Z0-9
代替\\w
是\\w
包含_
,这不符合要求。
You can use a character set. 您可以使用字符集。 Change the regex you are using (
"\\\\w+"
) to this: 将您正在使用的正则表达式(
"\\\\w+"
)更改为此:
"[\\s&&[^\\W_]]"
This will allow alphanumerals ( [^\\W_]
=> a-zA-Z0-9
) and whitespaces ( \\s
) to be matched, instead of only word characters. 这将允许字母数字(
[^\\W_]
=> a-zA-Z0-9
)和空格( \\s
)匹配,而不仅仅是单词字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.