简体   繁体   English

当前面的字符是单词边界时,\\ b不匹配

[英]\b doesn't match when the preceding character is a word boundary

I have a rather peculiar problem. 我有一个相当特殊的问题。 I'm trying to find a pattern like [some string][word boundary] . 我试图找到像[some string][word boundary] Simplified, my code is: 简化,我的代码是:

final Pattern pattern = Pattern.compile(Pattern.quote(someString) + "\\b");
final String value = someString + " ";
System.out.println(pattern.matcher(value).find());

My logic tells me this should always output true , regardless of what someString is. 我的逻辑告诉我,无论someString是什么,它都应该总是输出true However: 然而:

  • if someString ends with a word character (eg "abc"), true is outputted; 如果someString以单词字符结尾(例如“abc”),则输出true ;
  • if someString ends with a word boundary (eg "abc."), false is outputted. 如果someString以字边界结束(例如“abc。”),则输出false

Any ideas what is happening? 有什么想法发生了什么? My current workaround is to use \\W instead of \\b , but I'm not sure of the implications. 我目前的解决方法是使用\\W而不是\\b ,但我不确定其含义。

A dot then a space is not a word boundary. 点然后空格不是单词边界。

A word boundary is between a word character, then a non-word character, or visa versa. 单词边界在单词字符之间,然后是非单词字符,反之亦然。
ie between [a-zA-Z0-9_][^a-zA-Z0-9_] or [^a-zA-Z0-9_][a-zA-Z0-9_] [a-zA-Z0-9_][^a-zA-Z0-9_][^a-zA-Z0-9_][a-zA-Z0-9_]

A word boundary is a non-word character that is preceded by a word character or vice versa. 单词边界是一个非单词字符,前面是单词字符,反之亦然。 The space preceded by a period (2 non-word characters) does not meet this requirement. 以句点(2个非单词字符)开头的空格不符合此要求。

The effect of using \\W is that any non-word characters will be matched (the same as \\b , but without the condition that the character is preceded by a word character), which seems correct for your example. 使用\\W的效果是任何非单词字符都将匹配(与\\b相同,但不包含字符前面带有单词字符的条件),这对您的示例来说似乎是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM