繁体   English   中英

为 PHP 中的正则表达式模式生成所有可能的匹配项

[英]Generate all possible matches for regex pattern in PHP

SO上有很多问题询问如何解析正则表达式模式和output所有可能的匹配模式。 但是,出于某种原因,我能找到的每一个 1、2、3、4、5、6、7 可能更多)要么用于Java ,要么用于C (只有一个用于 JavaScript),我目前需要在 PHP 中执行此操作。

我已经用谷歌搜索了我内心的(dis)内容,但无论我做什么,谷歌给我的几乎唯一的东西就是指向preg_match()的文档的链接和关于如何使用正则表达式的页面,这与我的相反想在这里。

我的正则表达式模式都非常简单并且保证是有限的; 唯一使用的语法是:

  • []用于字符类
  • ()用于子组(不需要捕获)
  • | (管道)用于子组内的替代匹配
  • ? 对于零或一匹配

所以一个例子可能是[ct]hun(k|der)(s|ed|ing)? 匹配动词chunkthunkchunder和 Thunder 的所有可能的forms ,总共有 16 个排列。

理想情况下,有一个用于 PHP 的库或工具,它将遍历(有限)正则表达式模式和 output 所有可能的匹配,准备好 Z34D1F91FB2E514B8576Z34D1F91FB2E514B8576ZFAB1A75A89A6B。 有谁知道这样的库/工具是否已经存在?

如果不是,那么制作一个优化的方法是什么? JavaScript 的这个答案是我能找到的最接近我应该能够适应的东西,但不幸的是我无法理解它的实际工作原理,这使得适应它更加棘手。 另外,无论如何,在 PHP 中可能有更好的方法。 关于如何最好地分解任务的一些逻辑指针将不胜感激。

编辑:由于显然不清楚这在实践中会如何,我正在寻找允许这种类型输入的东西:

$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');

– 然后打印$possibleMatches应该给出这样的结果(在我的情况下,元素的顺序并不重要):

Array
(
    [0] => chunk
    [1] => thunk
    [2] => chunks
    [3] => thunks
    [4] => chunked
    [5] => thunked
    [6] => chunking
    [7] => thunking
    [8] => chunder
    [9] => thunder
    [10] => chunders
    [11] => thunders
    [12] => chundered
    [13] => thundered
    [14] => chundering
    [15] => thundering
)

方法

  1. 您需要去除可变模式 你可以使用preg_match_all来做到这一点

    preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches); /* Regex: /(\[\w+\]|\([\w|]+\))/ /: Pattern delimiter (: Start of capture group \[\w+\]: Character class pattern |: OR operator \([\w|]+\): Capture group pattern ): End of capture group /: Pattern delimiter */
  2. 然后,您可以将捕获组扩展为字母或单词(取决于类型)

     $array = str_split($cleanString, 1); // For a character class $array = explode("|", $cleanString); // For a capture group
  3. 以递归方式遍历每个$array

代码

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";

preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

附加功能

扩展嵌套组

在使用中,您可以将它放在“preg_match_all”之前。

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';

echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;

Output:

This happen(s|ed) to (become|be|have|having) test case 1?

匹配单个字母

这样做的重点是更新正则表达式:

$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

并将else添加到prepOptions function:

} else {
    $array = [$cleanString];
}

完整的工作示例

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        } else {
            $array = [$cleanString];
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex);


preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

Output:

This happens to become test case 1
This happens to become test case 
This happens to be test case 1
This happens to be test case 
This happens to have test case 1
This happens to have test case 
This happens to having test case 1
This happens to having test case 
This happened to become test case 1
This happened to become test case 
This happened to be test case 1
This happened to be test case 
This happened to have test case 1
This happened to have test case 
This happened to having test case 1
This happened to having test case 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM