[英]regular expression \b in java and javascript
Is there any difference of use regular expression \\b in java and js? 在java和js中使用正则表达式\\ b有什么区别吗?
I tried below test: 我试过以下测试:
in javascript: 在javascript中:
console.log(/\w+\b/.test("test中文"));//true
in java: 在java中:
String regEx = "\\w+\\b";
text = "test中文";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("matched");//never executed
}
Why the result of the two example above are not same? 为什么上面两个例子的结果不一样?
That is because by default Java supports Unicode for \\b
but not for \\w
, while JavaScript doesn't support Unicode for both. 这是因为默认情况下Java支持\\b
而不支持\\w
,而JavaScript不支持Unicode。
So \\w
can only match [a-zA-Z0-9_]
characters (in our case test
) but \\b
can't accept place (marked with |
) 所以\\w
只能匹配[a-zA-Z0-9_]
字符(在我们的案例test
)但是\\b
不能接受的地方(标有|
)
test|中文
as between alphabetic and non-alphabetic Unicode standards because both t
and 中
are considered alphabetic characters by Unicode. 在字母和非字母Unicode标准之间,因为t
和中
的字符都被Unicode视为字母字符。
If you want to have \\b
which will ignore Unicode you can use look-around mechanism and rewrite it as (?:(?<=\\\\w)(?!\\\\w)|(?<!\\\\w)(?=\\\\w))
, or in case of this example simple (?!\\\\w)
instead of \\\\b
will also work. 如果你想让\\b
忽略Unicode你可以使用环视机制并将其重写为(?:(?<=\\\\w)(?!\\\\w)|(?<!\\\\w)(?=\\\\w))
,或者在这个例子的情况下简单(?!\\\\w)
而不是\\\\b
也可以。
If you want \\w
to also support Unicode compile your pattern with Pattern.UNICODE_CHARACTER_CLASS
flag (which can also be written as flag expression (?U)
) 如果你想\\w
也支持Unicode用Pattern.UNICODE_CHARACTER_CLASS
标志编译你的模式(也可以写成标志表达式(?U)
)
The Jeva regex looks for a sequence of word characters, ie [a-zA-Z_0-9]+
preceding a word boundary. Jeva正则表达式寻找一系列单词字符,即在单词边界之前的[a-zA-Z_0-9]+
。 But 中文 doesn't fit \\w
. 但是中文不适合\\w
。 If you use \\\\b
alone, you'll find two matches: begin and end of the string. 如果单独使用\\\\b
,您将找到两个匹配项:字符串的开头和结尾。
As has been pointed out by georg, Javascript isn't interpreting characters the same way as Java's Regex engine. 正如georg所指出的,Javascript不像Java的Regex引擎那样解释字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.