简体   繁体   English

查找带有特殊字符的 substring

[英]find substring with special characters

I have pattern 'šalotka 29%' and i need to know if string 'something something šalotka 29% something' contains the pattern but not if the pattern is part of a longer word 'something something šalotka 29%something'我有模式'šalotka 29%' ,我需要知道字符串'something something šalotka 29% something'是否包含该模式,但如果模式是较长单词'something something šalotka 29%something'一部分则不包含

I have this mb_eregi('\b'. $pattern. '\b', $string) but its not working because regex boundaries not working with special character.我有这个mb_eregi('\b'. $pattern. '\b', $string)但它不起作用,因为正则表达式边界不适用于特殊字符。 Any suggestion?有什么建议吗?

A word boundary matches only between a word character (a character from the \w character class) and a non-word character or the limit of the string.单词边界仅匹配单词字符(来自\w字符类的字符)和非单词字符或字符串的限制。

If your searched string starts or ends with a non-word character, you can't use a word-boundary.如果您搜索的字符串以非单词字符开头或结尾,则不能使用单词边界。

The difficulty is to define yourself precisely what separates the desired chain from the rest. In other words, it is your choice.困难在于自己准确定义所需链与 rest 之间的区别。换句话说,这是您的选择。 Whatever your choice is, you can use the same technique: using lookarounds before and after your string to define what you don't want around your string: a negative lookbehind (?<....) and a negative lookahead (?....) .无论您的选择是什么,您都可以使用相同的技术:在字符串之前和之后使用环视来定义您不希望在字符串周围出现的内容:负向后视(?<....)和负向前视(?....)

Example:例子:

  • to forbid all that isn't a whitespace around the string:禁止字符串周围的所有非空格:
mb_eregi('(?<!\S)' . $item . '(?!\S)', $string, $match);
  • to forbid all that isn't a word character:禁止所有不是单词的字符:
mb_eregi('(?<!\w)' . $item . '(?!\w)', $string, $match);

full example:完整示例:

$item = 'šalotka 29%';
$string = 'something something šalotk 29% something';

mb_regex_encoding('UTF-8'); // be sure to use the correct encoding

// if needed escape regex special characters
$item = mb_eregi_replace('[\[\](){}.\\\\|$^?+*#-]', '\\\0', $item);

mb_eregi('(?<!\S)' . $item . '(?!\S)', $string, $matches);

print_r($matches);

Notices:注意事项:

  • If ereg functions are now obsolete and have been removed from recent PHP versions, mb_ereg functions, based on the oniguruma regex engine, still exist and offer features not available in preg_ functions (PCRE).如果ereg函数现在已过时并且已从最近的 PHP 版本中删除,基于 oniguruma 正则表达式引擎的mb_ereg函数仍然存在并提供preg_函数 (PCRE) 中不可用的功能。

  • Obviously for this current question, you can do the same with preg_match :显然对于当前这个问题,您可以对preg_match做同样的事情:

preg_match('~(?<!\S)' . $item . '(?!\S)~ui', $string, $match);
  • If don't have the control of the searched string (a user input for example), take care that this one doesn't contain special regex characters.如果无法控制搜索到的字符串(例如用户输入),请注意不要包含特殊的正则表达式字符。
    With preg_ functions you can use preg_quote to escape them, but it's also possible to "do it yourself" with $item = mb_ereg_replace('[\[\](){}.\\\\|$^?+*#-]', '\\\0', $item);使用preg_函数,您可以使用preg_quote来转义它们,但也可以使用$item = mb_ereg_replace('[\[\](){}.\\\\|$^?+*#-]', '\\\0', $item); that suffices for most of the syntaxes available in mb_ereg functions (Note that escaping all non-word characters does the job too).这足以满足mb_ereg函数中可用的大多数语法(请注意,escaping 所有非单词字符也可以完成这项工作)。 Feel free to write your own if you want to deal with Emacs or BRE syntaxes.如果你想处理 Emacs 或 BRE 语法,请随意编写你自己的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM