简体   繁体   中英

find substring with special characters

I have pattern 'šalotka 29%' and i need to know if string 'something something šalotka 29% something' contains the pattern but not if the pattern is part of a longer word 'something something šalotka 29%something'

I have this mb_eregi('\b'. $pattern. '\b', $string) but its not working because regex boundaries not working with special character. Any suggestion?

A word boundary matches only between a word character (a character from the \w character class) and a non-word character or the limit of the string.

If your searched string starts or ends with a non-word character, you can't use a word-boundary.

The difficulty is to define yourself precisely what separates the desired chain from the rest. In other words, it is your choice. Whatever your choice is, you can use the same technique: using lookarounds before and after your string to define what you don't want around your string: a negative lookbehind (?<....) and a negative lookahead (?....) .

Example:

  • to forbid all that isn't a whitespace around the string:
mb_eregi('(?<!\S)' . $item . '(?!\S)', $string, $match);
  • to forbid all that isn't a word character:
mb_eregi('(?<!\w)' . $item . '(?!\w)', $string, $match);

full example:

$item = 'šalotka 29%';
$string = 'something something šalotk 29% something';

mb_regex_encoding('UTF-8'); // be sure to use the correct encoding

// if needed escape regex special characters
$item = mb_eregi_replace('[\[\](){}.\\\\|$^?+*#-]', '\\\0', $item);

mb_eregi('(?<!\S)' . $item . '(?!\S)', $string, $matches);

print_r($matches);

Notices:

  • If ereg functions are now obsolete and have been removed from recent PHP versions, mb_ereg functions, based on the oniguruma regex engine, still exist and offer features not available in preg_ functions (PCRE).

  • Obviously for this current question, you can do the same with preg_match :

preg_match('~(?<!\S)' . $item . '(?!\S)~ui', $string, $match);
  • If don't have the control of the searched string (a user input for example), take care that this one doesn't contain special regex characters.
    With preg_ functions you can use preg_quote to escape them, but it's also possible to "do it yourself" with $item = mb_ereg_replace('[\[\](){}.\\\\|$^?+*#-]', '\\\0', $item); that suffices for most of the syntaxes available in mb_ereg functions (Note that escaping all non-word characters does the job too). Feel free to write your own if you want to deal with Emacs or BRE syntaxes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM