简体   繁体   English

正则表达式在Unicode上失败

[英]Regular expression fails on unicode

I'm trying to find the string "C#" in a text using php and reg exp. 我正在尝试使用php和reg exp在文本中找到字符串“ C#”。

I'm using 我正在使用

\bc\x{0023}\b

But doesn't work at all. 但是根本不起作用。

\bc\x{0023} 

works but that's not a solution for me 可行,但这不是我的解决方案

Any clue ? 有什么线索吗?

It's because the escape sequence \\b means a word boundary. 这是因为转义序列\\b表示单词边界。 Word is defined according to the PHP manual as: " A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". ". Word根据PHP手册定义为:“ “单词”字符是任何字母或数字或下划线字符,即可以成为Perl“单词”的一部分的任何字符。 ”。

Word boundary means the boundary between a word and a nonword. 单词边界是指单词和非单词之间的边界。 In otherwords, a between a character that is a word character and character is a not a word character. 换句话说,在作为单词字符的字符与该字符之间的a不是单词字符。 The problem is that # is not a word character. 问题是#不是单词字符。 Thus, unless # is followed by a word character, #\\b will never match. 因此,除非#后接文字字符,否则#\\b将永远不会匹配。

Perhaps you should define more clearly using character classes what you want. 也许您应该使用字符类更清楚地定义所需的内容。 For example /\\bc#(?![az])/i (that is, C# that is not followed by az character range) 例如, /\\bc#(?![az])/i (即,C#后面没有z字符范围)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM