简体   繁体   中英

PHP - quick regular expression question

so I am trying to match word in a wall of text and return few words before and after the match. Everything is working, but I would like to ask if there is any way to modify it so it will look for similar words. Hmm, let me show you an example:

preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*(pripravená)(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);

This code returns a match, but I would like it to modify it so

preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*(pripravena)(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);

would also return a match. Its slovak language and I tried with range of unicode characters and also with \\p{Sk} (and few others) but to no avail. Maybe I just put it in the wrong place, I dont know...

Is something like this possible?

Any help is appreciated

I don't know if there is a "ignore accent" switch. But you could replace your search query with something like:

$query = 'pripravená';
$query = preg_replace(
  array('=[áàâa]=i','=[óòôo]=i','=[úùûu]=i'),
  array( '[áàâa]'  , '[óòôo]'  , '[úùûu]'  ),
  $query
);
preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*('.$query.')(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);

That would convert your 'pripravená' query into 'pripraven[áàâa]' .

(pripraven[áa]) or (pripravena\\p{M}*) or, more likely, some combination of these approaches.

I don't know of any other, more concise, way of specifying "all Latin-1 vowels that are similar to 'a' in my current locale".

You could use strtr() to strip out the accents: See the PHP manual page for a good example - http://php.net/manual/en/function.strtr.php

$addr = strtr($addr, "äåö", "aao");

You'd still need to specify all the relevant characters, but it would be easier than using a regex to do it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM