[英]Regular expression fails on unicode
I'm trying to find the string "C#" in a text using php and reg exp. 我正在尝试使用php和reg exp在文本中找到字符串“ C#”。
I'm using 我正在使用
\bc\x{0023}\b
But doesn't work at all. 但是根本不起作用。
\bc\x{0023}
works but that's not a solution for me 可行,但这不是我的解决方案
Any clue ? 有什么线索吗?
It's because the escape sequence \\b
means a word boundary. 这是因为转义序列
\\b
表示单词边界。 Word is defined according to the PHP manual as: " A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". ". Word根据PHP手册定义为:“ “单词”字符是任何字母或数字或下划线字符,即可以成为Perl“单词”的一部分的任何字符。 ”。
Word boundary means the boundary between a word and a nonword. 单词边界是指单词和非单词之间的边界。 In otherwords, a between a character that is a word character and character is a not a word character.
换句话说,在作为单词字符的字符与该字符之间的a不是单词字符。 The problem is that
#
is not a word character. 问题是
#
不是单词字符。 Thus, unless #
is followed by a word character, #\\b
will never match. 因此,除非
#
后接文字字符,否则#\\b
将永远不会匹配。
Perhaps you should define more clearly using character classes what you want. 也许您应该使用字符类更清楚地定义所需的内容。 For example
/\\bc#(?![az])/i
(that is, C# that is not followed by az character range) 例如,
/\\bc#(?![az])/i
(即,C#后面没有z字符范围)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.