简体   繁体   English

java和javascript中的正则表达式\\ b

[英]regular expression \b in java and javascript

Is there any difference of use regular expression \\b in java and js? 在java和js中使用正则表达式\\ b有什么区别吗?
I tried below test: 我试过以下测试:
in javascript: 在javascript中:

console.log(/\w+\b/.test("test中文"));//true  

in java: 在java中:

String regEx = "\\w+\\b";
text = "test中文";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
    System.out.println("matched");//never executed
}

Why the result of the two example above are not same? 为什么上面两个例子的结果不一样?

That is because by default Java supports Unicode for \\b but not for \\w , while JavaScript doesn't support Unicode for both. 这是因为默认情况下Java支持\\b而不支持\\w ,而JavaScript不支持Unicode。

So \\w can only match [a-zA-Z0-9_] characters (in our case test ) but \\b can't accept place (marked with | ) 所以\\w只能匹配[a-zA-Z0-9_]字符(在我们的案例test )但是\\b不能接受的地方(标有|

test|中文

as between alphabetic and non-alphabetic Unicode standards because both t and are considered alphabetic characters by Unicode. 在字母和非字母Unicode标准之间,因为t的字符都被Unicode视为字母字符。

If you want to have \\b which will ignore Unicode you can use look-around mechanism and rewrite it as (?:(?<=\\\\w)(?!\\\\w)|(?<!\\\\w)(?=\\\\w)) , or in case of this example simple (?!\\\\w) instead of \\\\b will also work. 如果你想让\\b忽略Unicode你可以使用环视机制并将其重写为(?:(?<=\\\\w)(?!\\\\w)|(?<!\\\\w)(?=\\\\w)) ,或者在这个例子的情况下简单(?!\\\\w)而不是\\\\b也可以。

If you want \\w to also support Unicode compile your pattern with Pattern.UNICODE_CHARACTER_CLASS flag (which can also be written as flag expression (?U) ) 如果你想\\w也支持Unicode用Pattern.UNICODE_CHARACTER_CLASS标志编译你的模式(也可以写成标志表达式(?U)

The Jeva regex looks for a sequence of word characters, ie [a-zA-Z_0-9]+ preceding a word boundary. Jeva正则表达式寻找一系列单词字符,即在单词边界之前的[a-zA-Z_0-9]+ But 中文 doesn't fit \\w . 但是中文不适合\\w If you use \\\\b alone, you'll find two matches: begin and end of the string. 如果单独使用\\\\b ,您将找到两个匹配项:字符串的开头和结尾。

As has been pointed out by georg, Javascript isn't interpreting characters the same way as Java's Regex engine. 正如georg所指出的,Javascript不像Java的Regex引擎那样解释字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM