简体   繁体   中英

Generate all possible matches for regex pattern in PHP

There are quite a few questions on SO asking about how to parse a regex pattern and output all possible matches to that pattern. For some reason, though, every single one of them I can find ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , probably more) are either for Java or some variety of C (and just one for JavaScript), and I currently need to do this in PHP.

I've Googled to my heart's (dis)content, but whatever I do, pretty much the only thing that Google gives me is links to the docs for preg_match() and pages about how to use regex, which is the opposite of what I want here.

My regex patterns are all very simple and guaranteed to be finite; the only syntax used is:

  • [] for character classes
  • () for subgroups (capturing not required)
  • | (pipe) for alternative matches within subgroups
  • ? for zero-or-one matches

So an example might be [ct]hun(k|der)(s|ed|ing)? to match all possible forms of the verbs chunk , thunk , chunder and thunder , for a total of sixteen permutations.

Ideally, there'd be a library or tool for PHP which will iterate through (finite) regex patterns and output all possible matches, all ready to go. Does anyone know if such a library/tool already exists?

If not, what is an optimised way to approach making one? This answer for JavaScript is the closest I've been able to find to something I should be able to adapt, but unfortunately I just can't wrap my head around how it actually works, which makes adapting it more tricky. Plus there may well be better ways of doing it in PHP anyway. Some logical pointers as to how the task would best be broken down would be greatly appreciated.

Edit: Since apparently it wasn't clear how this would look in practice, I am looking for something that will allow this type of input:

$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');

– and printing $possibleMatches should then give something like this (the order of the elements is not important in my case):

Array
(
    [0] => chunk
    [1] => thunk
    [2] => chunks
    [3] => thunks
    [4] => chunked
    [5] => thunked
    [6] => chunking
    [7] => thunking
    [8] => chunder
    [9] => thunder
    [10] => chunders
    [11] => thunders
    [12] => chundered
    [13] => thundered
    [14] => chundering
    [15] => thundering
)

Method

  1. You need to strip out the variable patterns ; you can use preg_match_all to do this

    preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches); /* Regex: /(\[\w+\]|\([\w|]+\))/ /: Pattern delimiter (: Start of capture group \[\w+\]: Character class pattern |: OR operator \([\w|]+\): Capture group pattern ): End of capture group /: Pattern delimiter */
  2. You can then expand the capture groups to letters or words (depending on type)

     $array = str_split($cleanString, 1); // For a character class $array = explode("|", $cleanString); // For a capture group
  3. Recursively work your way through each $array

Code

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";

preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

Additional functionality

Expanding nested groups

In use you would put this before the "preg_match_all".

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';

echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;

Output:

This happen(s|ed) to (become|be|have|having) test case 1?

Matching single letters

The bones of this would be to update the regex:

$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

and add an else to the prepOptions function:

} else {
    $array = [$cleanString];
}

Full working example

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        } else {
            $array = [$cleanString];
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex);


preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

Output:

This happens to become test case 1
This happens to become test case 
This happens to be test case 1
This happens to be test case 
This happens to have test case 1
This happens to have test case 
This happens to having test case 1
This happens to having test case 
This happened to become test case 1
This happened to become test case 
This happened to be test case 1
This happened to be test case 
This happened to have test case 1
This happened to have test case 
This happened to having test case 1
This happened to having test case 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM