[英]\b doesn't match when the preceding character is a word boundary
I have a rather peculiar problem. 我有一个相当特殊的问题。 I'm trying to find a pattern like [some string][word boundary]
. 我试图找到像[some string][word boundary]
。 Simplified, my code is: 简化,我的代码是:
final Pattern pattern = Pattern.compile(Pattern.quote(someString) + "\\b");
final String value = someString + " ";
System.out.println(pattern.matcher(value).find());
My logic tells me this should always output true
, regardless of what someString
is. 我的逻辑告诉我,无论someString
是什么,它都应该总是输出true
。 However: 然而:
someString
ends with a word character (eg "abc"), true
is outputted; 如果someString
以单词字符结尾(例如“abc”),则输出true
; someString
ends with a word boundary (eg "abc."), false
is outputted. 如果someString
以字边界结束(例如“abc。”),则输出false
。 Any ideas what is happening? 有什么想法发生了什么? My current workaround is to use \\W
instead of \\b
, but I'm not sure of the implications. 我目前的解决方法是使用\\W
而不是\\b
,但我不确定其含义。
A dot then a space is not a word boundary. 点然后空格不是单词边界。
A word boundary is between a word character, then a non-word character, or visa versa. 单词边界在单词字符之间,然后是非单词字符,反之亦然。
ie between [a-zA-Z0-9_][^a-zA-Z0-9_]
or [^a-zA-Z0-9_][a-zA-Z0-9_]
即[a-zA-Z0-9_][^a-zA-Z0-9_]
或[^a-zA-Z0-9_][a-zA-Z0-9_]
A word boundary is a non-word character that is preceded by a word character or vice versa. 单词边界是一个非单词字符,前面是单词字符,反之亦然。 The space preceded by a period (2 non-word characters) does not meet this requirement. 以句点(2个非单词字符)开头的空格不符合此要求。
The effect of using \\W
is that any non-word characters will be matched (the same as \\b
, but without the condition that the character is preceded by a word character), which seems correct for your example. 使用\\W
的效果是任何非单词字符都将匹配(与\\b
相同,但不包含字符前面带有单词字符的条件),这对您的示例来说似乎是正确的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.