简体   繁体   English

为 PHP 中的正则表达式模式生成所有可能的匹配项

[英]Generate all possible matches for regex pattern in PHP

There are quite a few questions on SO asking about how to parse a regex pattern and output all possible matches to that pattern. SO上有很多问题询问如何解析正则表达式模式和output所有可能的匹配模式。 For some reason, though, every single one of them I can find ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , probably more) are either for Java or some variety of C (and just one for JavaScript), and I currently need to do this in PHP.但是,出于某种原因,我能找到的每一个 1、2、3、4、5、6、7 可能更多)要么用于Java ,要么用于C (只有一个用于 JavaScript),我目前需要在 PHP 中执行此操作。

I've Googled to my heart's (dis)content, but whatever I do, pretty much the only thing that Google gives me is links to the docs for preg_match() and pages about how to use regex, which is the opposite of what I want here.我已经用谷歌搜索了我内心的(dis)内容,但无论我做什么,谷歌给我的几乎唯一的东西就是指向preg_match()的文档的链接和关于如何使用正则表达式的页面,这与我的相反想在这里。

My regex patterns are all very simple and guaranteed to be finite;我的正则表达式模式都非常简单并且保证是有限的; the only syntax used is:唯一使用的语法是:

  • [] for character classes []用于字符类
  • () for subgroups (capturing not required) ()用于子组(不需要捕获)
  • | (pipe) for alternative matches within subgroups (管道)用于子组内的替代匹配
  • ? for zero-or-one matches对于零或一匹配

So an example might be [ct]hun(k|der)(s|ed|ing)?所以一个例子可能是[ct]hun(k|der)(s|ed|ing)? to match all possible forms of the verbs chunk , thunk , chunder and thunder , for a total of sixteen permutations.匹配动词chunkthunkchunder和 Thunder 的所有可能的forms ,总共有 16 个排列。

Ideally, there'd be a library or tool for PHP which will iterate through (finite) regex patterns and output all possible matches, all ready to go.理想情况下,有一个用于 PHP 的库或工具,它将遍历(有限)正则表达式模式和 output 所有可能的匹配,准备好 Z34D1F91FB2E514B8576Z34D1F91FB2E514B8576ZFAB1A75A89A6B。 Does anyone know if such a library/tool already exists?有谁知道这样的库/工具是否已经存在?

If not, what is an optimised way to approach making one?如果不是,那么制作一个优化的方法是什么? This answer for JavaScript is the closest I've been able to find to something I should be able to adapt, but unfortunately I just can't wrap my head around how it actually works, which makes adapting it more tricky. JavaScript 的这个答案是我能找到的最接近我应该能够适应的东西,但不幸的是我无法理解它的实际工作原理,这使得适应它更加棘手。 Plus there may well be better ways of doing it in PHP anyway.另外,无论如何,在 PHP 中可能有更好的方法。 Some logical pointers as to how the task would best be broken down would be greatly appreciated.关于如何最好地分解任务的一些逻辑指针将不胜感激。

Edit: Since apparently it wasn't clear how this would look in practice, I am looking for something that will allow this type of input:编辑:由于显然不清楚这在实践中会如何,我正在寻找允许这种类型输入的东西:

$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');

– and printing $possibleMatches should then give something like this (the order of the elements is not important in my case): – 然后打印$possibleMatches应该给出这样的结果(在我的情况下,元素的顺序并不重要):

Array
(
    [0] => chunk
    [1] => thunk
    [2] => chunks
    [3] => thunks
    [4] => chunked
    [5] => thunked
    [6] => chunking
    [7] => thunking
    [8] => chunder
    [9] => thunder
    [10] => chunders
    [11] => thunders
    [12] => chundered
    [13] => thundered
    [14] => chundering
    [15] => thundering
)

Method方法

  1. You need to strip out the variable patterns ;您需要去除可变模式 you can use preg_match_all to do this你可以使用preg_match_all来做到这一点

    preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches); /* Regex: /(\[\w+\]|\([\w|]+\))/ /: Pattern delimiter (: Start of capture group \[\w+\]: Character class pattern |: OR operator \([\w|]+\): Capture group pattern ): End of capture group /: Pattern delimiter */
  2. You can then expand the capture groups to letters or words (depending on type)然后,您可以将捕获组扩展为字母或单词(取决于类型)

     $array = str_split($cleanString, 1); // For a character class $array = explode("|", $cleanString); // For a capture group
  3. Recursively work your way through each $array以递归方式遍历每个$array

Code代码

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";

preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

Additional functionality附加功能

Expanding nested groups扩展嵌套组

In use you would put this before the "preg_match_all".在使用中,您可以将它放在“preg_match_all”之前。

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';

echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;

Output: Output:

This happen(s|ed) to (become|be|have|having) test case 1?

Matching single letters匹配单个字母

The bones of this would be to update the regex:这样做的重点是更新正则表达式:

$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

and add an else to the prepOptions function:并将else添加到prepOptions function:

} else {
    $array = [$cleanString];
}

Full working example完整的工作示例

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        } else {
            $array = [$cleanString];
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex);


preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

Output: Output:

This happens to become test case 1
This happens to become test case 
This happens to be test case 1
This happens to be test case 
This happens to have test case 1
This happens to have test case 
This happens to having test case 1
This happens to having test case 
This happened to become test case 1
This happened to become test case 
This happened to be test case 1
This happened to be test case 
This happened to have test case 1
This happened to have test case 
This happened to having test case 1
This happened to having test case 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM