[英]Generate all possible matches for regex pattern in PHP
SO上有很多问题询问如何解析正则表达式模式和output所有可能的匹配模式。 但是,出于某种原因,我能找到的每一个( 1、2、3、4、5、6、7 ,可能更多)要么用于Java ,要么用于C (只有一个用于 JavaScript),我目前需要在 PHP 中执行此操作。
我已经用谷歌搜索了我内心的(dis)内容,但无论我做什么,谷歌给我的几乎唯一的东西就是指向preg_match()
的文档的链接和关于如何使用正则表达式的页面,这与我的相反想在这里。
我的正则表达式模式都非常简单并且保证是有限的; 唯一使用的语法是:
[]
用于字符类()
用于子组(不需要捕获)|
(管道)用于子组内的替代匹配?
对于零或一匹配所以一个例子可能是[ct]hun(k|der)(s|ed|ing)?
匹配动词chunk 、 thunk 、 chunder和 Thunder 的所有可能的forms ,总共有 16 个排列。
理想情况下,有一个用于 PHP 的库或工具,它将遍历(有限)正则表达式模式和 output 所有可能的匹配,准备好 Z34D1F91FB2E514B8576Z34D1F91FB2E514B8576ZFAB1A75A89A6B。 有谁知道这样的库/工具是否已经存在?
如果不是,那么制作一个优化的方法是什么? JavaScript 的这个答案是我能找到的最接近我应该能够适应的东西,但不幸的是我无法理解它的实际工作原理,这使得适应它更加棘手。 另外,无论如何,在 PHP 中可能有更好的方法。 关于如何最好地分解任务的一些逻辑指针将不胜感激。
编辑:由于显然不清楚这在实践中会如何,我正在寻找允许这种类型输入的东西:
$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');
– 然后打印$possibleMatches
应该给出这样的结果(在我的情况下,元素的顺序并不重要):
Array
(
[0] => chunk
[1] => thunk
[2] => chunks
[3] => thunks
[4] => chunked
[5] => thunked
[6] => chunking
[7] => thunking
[8] => chunder
[9] => thunder
[10] => chunders
[11] => thunders
[12] => chundered
[13] => thundered
[14] => chundering
[15] => thundering
)
您需要去除可变模式; 你可以使用preg_match_all
来做到这一点
preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches); /* Regex: /(\[\w+\]|\([\w|]+\))/ /: Pattern delimiter (: Start of capture group \[\w+\]: Character class pattern |: OR operator \([\w|]+\): Capture group pattern ): End of capture group /: Pattern delimiter */
然后,您可以将捕获组扩展为字母或单词(取决于类型)
$array = str_split($cleanString, 1); // For a character class $array = explode("|", $cleanString); // For a capture group
以递归方式遍历每个$array
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
在使用中,您可以将它放在“preg_match_all”之前。
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;
Output:
This happen(s|ed) to (become|be|have|having) test case 1?
这样做的重点是更新正则表达式:
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
并将else
添加到prepOptions
function:
} else {
$array = [$cleanString];
}
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
} else {
$array = [$cleanString];
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex);
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
Output:
This happens to become test case 1
This happens to become test case
This happens to be test case 1
This happens to be test case
This happens to have test case 1
This happens to have test case
This happens to having test case 1
This happens to having test case
This happened to become test case 1
This happened to become test case
This happened to be test case 1
This happened to be test case
This happened to have test case 1
This happened to have test case
This happened to having test case 1
This happened to having test case
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.