[英]Generate all possible matches for regex pattern in PHP
SO上有很多問題詢問如何解析正則表達式模式和output所有可能的匹配模式。 但是,出於某種原因,我能找到的每一個( 1、2、3、4、5、6、7 ,可能更多)要么用於Java ,要么用於C (只有一個用於 JavaScript),我目前需要在 PHP 中執行此操作。
我已經用谷歌搜索了我內心的(dis)內容,但無論我做什么,谷歌給我的幾乎唯一的東西就是指向preg_match()
的文檔的鏈接和關於如何使用正則表達式的頁面,這與我的相反想在這里。
我的正則表達式模式都非常簡單並且保證是有限的; 唯一使用的語法是:
[]
用於字符類()
用於子組(不需要捕獲)|
(管道)用於子組內的替代匹配?
對於零或一匹配所以一個例子可能是[ct]hun(k|der)(s|ed|ing)?
匹配動詞chunk 、 thunk 、 chunder和 Thunder 的所有可能的forms ,總共有 16 個排列。
理想情況下,有一個用於 PHP 的庫或工具,它將遍歷(有限)正則表達式模式和 output 所有可能的匹配,准備好 Z34D1F91FB2E514B8576Z34D1F91FB2E514B8576ZFAB1A75A89A6B。 有誰知道這樣的庫/工具是否已經存在?
如果不是,那么制作一個優化的方法是什么? JavaScript 的這個答案是我能找到的最接近我應該能夠適應的東西,但不幸的是我無法理解它的實際工作原理,這使得適應它更加棘手。 另外,無論如何,在 PHP 中可能有更好的方法。 關於如何最好地分解任務的一些邏輯指針將不勝感激。
編輯:由於顯然不清楚這在實踐中會如何,我正在尋找允許這種類型輸入的東西:
$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');
– 然后打印$possibleMatches
應該給出這樣的結果(在我的情況下,元素的順序並不重要):
Array
(
[0] => chunk
[1] => thunk
[2] => chunks
[3] => thunks
[4] => chunked
[5] => thunked
[6] => chunking
[7] => thunking
[8] => chunder
[9] => thunder
[10] => chunders
[11] => thunders
[12] => chundered
[13] => thundered
[14] => chundering
[15] => thundering
)
您需要去除可變模式; 你可以使用preg_match_all
來做到這一點
preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches); /* Regex: /(\[\w+\]|\([\w|]+\))/ /: Pattern delimiter (: Start of capture group \[\w+\]: Character class pattern |: OR operator \([\w|]+\): Capture group pattern ): End of capture group /: Pattern delimiter */
然后,您可以將捕獲組擴展為字母或單詞(取決於類型)
$array = str_split($cleanString, 1); // For a character class $array = explode("|", $cleanString); // For a capture group
以遞歸方式遍歷每個$array
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
在使用中,您可以將它放在“preg_match_all”之前。
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;
Output:
This happen(s|ed) to (become|be|have|having) test case 1?
這樣做的重點是更新正則表達式:
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
並將else
添加到prepOptions
function:
} else {
$array = [$cleanString];
}
function printMatches($pattern, $array, $matchPattern)
{
$currentArray = array_shift($array);
foreach ($currentArray as $option) {
$patternModified = preg_replace($matchPattern, $option, $pattern, 1);
if (!count($array)) {
echo $patternModified, PHP_EOL;
} else {
printMatches($patternModified, $array, $matchPattern);
}
}
}
function prepOptions($matches)
{
foreach ($matches as $match) {
$cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
if ($match[0] === "[") {
$array = str_split($cleanString, 1);
} elseif ($match[0] === "(") {
$array = explode("|", $cleanString);
} else {
$array = [$cleanString];
}
if ($match[-1] === "?") {
$array[] = "";
}
$possibilites[] = $array;
}
return $possibilites;
}
$regex = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";
$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
$output = explode("|", $array[3]);
if ($array[0][-1] === "?") {
$output[] = "";
}
foreach ($output as &$option) {
$option = $array[2] . $option;
}
return $array[1] . implode("|", $output);
}, $regex);
preg_match_all($matchPattern, $regex, $matches);
printMatches(
$regex,
prepOptions($matches[0]),
$matchPattern
);
Output:
This happens to become test case 1
This happens to become test case
This happens to be test case 1
This happens to be test case
This happens to have test case 1
This happens to have test case
This happens to having test case 1
This happens to having test case
This happened to become test case 1
This happened to become test case
This happened to be test case 1
This happened to be test case
This happened to have test case 1
This happened to have test case
This happened to having test case 1
This happened to having test case
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.