简体   繁体   中英

Regex optional word in regex

I am trying to search some arrays with regular expressions. Some words should be "optional", which means it's good if they are in the string (for relevancy)

Here is my attempt:

preg_match_all('/(?:animal)? (lamina)/', $searchExpression, $matches);

It does not work though. What I am trying to achieve here is, the string must contain lamina, and may contain animal. If the string contains both animal and lamina, it would have better relevancy then a match that just matches lamina.

How can I fix the regex? And how do I sort the matches, too see which one "matches" best.

ie

$animalStuff = array('animal lamina', 'lamina', 'animal');

The first 2 items should match , the third should not. and animal lamina should probably be the most relevant. How do I perform the relevancy?

$animalStuff = array('animal lamina', 'lamina', 'animal');

$results = array();

foreach ($animalStuff as $searchExpression)
{
    preg_match_all('/(?:animal)? (lamina)/', $searchExpression, $matches);

    var_dump($matches);

    // Do something here to decide if it should be in the top of array, etc.
}

Counting the amount of matches, could probably get me the best relevancy I assume, but I just need to get the regex working first to try that out.

You can use the following:

preg_match_all('/(animal.*?lamina)|(lamina)/', $searchExpression, $matches);

See DEMO

(this should probably be a comment)

There are several problems here, most of which do away if you stop using regular expressions to find the matches. Hence why the requirement to use regexes?

eg consider:

function matchwords($allwords, $requiredwords, $subject)
{
   $subject=preg_replace("/\W/", ' ', $subject);
   $subject=explode(' ', $subject);
   if (count(array_intersect($requiredwords, $subject)) {
      return count(array_intersect($allwords, $subject));
   }
   return 0;
}

Try benchmarking it - it may actually be faster than using regexes. With large word sets, inverting the arrays and using array_intersect_key() will likely be faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM