简体   繁体   English

为什么这个使用单词分隔符(\\ b)的正则表达式不匹配.Net中的示例?

[英]Why doesn't this regular expression using word separator (\b) match the example in .Net?

Should be simple enough, but this thing not working is baffling me, any insight into why is greatly appreciated. 应该足够简单,但这件事不起作用令人困惑,任何洞察力都为何非常感激。
I'm trying to match any instances of an abbreviated word with any number of trailing '.','/' or '-'. 我试图将缩写词的任何实例与任意数量的尾随'。','/'或' - '匹配。 Notice I'm using a '\\b' to try to grab the whole 'word' including the trailing characters mentioned above but not any following characters (it also has the advantage of matching against the end of the line or string). 注意我正在使用'\\ b'来尝试抓取整个'单词',包括上面提到的尾随字符,但不包含任何后续字符(它还具有匹配行或字符串末尾的优点)。 I'm using the following expression: 我正在使用以下表达式:

(?<target>\bLLC[\./\-]+\b)  

As an example, i'm trying to make it match this: 作为一个例子,我试图使它匹配这个:

Ace Charter High School LLC. East Liberty  

I want the expression to select 'LLC.' 我希望表达式选择“LLC”。 but instead it's not picking any matches I don't know why. 但相反,它不会选择任何匹配,我不知道为什么。
I've tried debugging the expression using RegexBuddy and it works if I remove the trailing '\\b' but that's not what I want as I explained before 我已经尝试使用RegexBuddy来调试表达式,如果我删除尾部的'\\ b',它会起作用,但这不是我想要的,正如我之前解释的那样

Anyone has any idea why this isn't working? 任何人都知道为什么这不起作用?

There is no word boundary that matches the last \\b . 没有与最后一个\\b匹配的单词边界。

The closest word boundaries are after LLC and before East , and your pattern doesn't allow for the last \\b to be at either of those places. 最接近的单词边界在LLC之后和East之前,并且您的模式不允许最后的\\b位于这些位置中的任何一个。

Try 尝试

(?<target>\bLLC[\./\-]+)\s*\b

This allows the whitespace preceding the word boundary (which is between the space and E as Guffa points out) without including those spaces in the match group " target ". 这允许字边界之前的空白(在空间和E之间作为Guffa指出),而不包括匹配组“ target ”中的那些空格。

On the other hand, matching a word boundary after the . 另一方面,匹配后的单词边界. isn't gaining you much, since punctuation is going to cause a word boundary unless it's followed by other punctuation. 并没有获得太多,因为标点符号会导致一个单词边界,除非它后面跟着其他标点符号。

I've had good responses that pointed me in the right direction but none really proposed an alternative to using '\\b' that had the same effect in terms of what is being targeted and that will match separator characters as well as the end of the string. 我得到了很好的回应,指出了我正确的方向,但没有人真正提出使用'\\ b'的替代方案,它在目标定位方面具有相同的效果,并且会匹配分隔符以及结束串。
As Guffa pointed out, the issue is that I was using '\\b' as a way to select any separator character or the end of the string at the position before that separator, when in reality it actually performs as what it represents: a word separator. 正如Guffa指出的那样,问题在于我使用'\\ b'作为选择任何分隔符的方法,或者在该分隔符之前的位置选择字符串的结尾,而实际上它实际上就像它所代表的一样:一个单词分隔器。 Since my selector was already in a position outside a word, it doesn't match as this position (after the '.') is neither the beginning of a word or the end of one, hence there are no matches in the whole string as a '\\b' after the target is still required for the match. 由于我的选择器已经位于单词之外的位置,因此它不匹配,因为此位置(在'。'之后)既不是单词的开头也不是单词的结尾,因此整个字符串中没有匹配项匹配后仍然需要目标后的'\\ b'。
I've finally settled for using the following expression: 我终于决定使用以下表达式:

(?<target>\bLLC[\./\-]+)([^a-zA-Z0-9]|$)

This matches any non alphanumeric character as well as the end of string and will match the 'target' group without any of the separating characters before or after producing the same effect I wanted in the first place. 这匹配任何非字母数字字符以及字符串的结尾,并且在产生我想要的相同效果之前或之后,将匹配“目标”组而没有任何分隔字符。 Thanks again for the responses and hopefully this will help others in a similar problem 再次感谢您的回复,希望这将有助于其他类似的问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM