簡體   English   中英

為 PHP 中的正則表達式模式生成所有可能的匹配項

[英]Generate all possible matches for regex pattern in PHP

SO上有很多問題詢問如何解析正則表達式模式和output所有可能的匹配模式。 但是,出於某種原因,我能找到的每一個 1、2、3、4、5、6、7 可能更多)要么用於Java ,要么用於C (只有一個用於 JavaScript),我目前需要在 PHP 中執行此操作。

我已經用谷歌搜索了我內心的(dis)內容,但無論我做什么,谷歌給我的幾乎唯一的東西就是指向preg_match()的文檔的鏈接和關於如何使用正則表達式的頁面,這與我的相反想在這里。

我的正則表達式模式都非常簡單並且保證是有限的; 唯一使用的語法是:

  • []用於字符類
  • ()用於子組(不需要捕獲)
  • | (管道)用於子組內的替代匹配
  • ? 對於零或一匹配

所以一個例子可能是[ct]hun(k|der)(s|ed|ing)? 匹配動詞chunkthunkchunder和 Thunder 的所有可能的forms ,總共有 16 個排列。

理想情況下,有一個用於 PHP 的庫或工具,它將遍歷(有限)正則表達式模式和 output 所有可能的匹配,准備好 Z34D1F91FB2E514B8576Z34D1F91FB2E514B8576ZFAB1A75A89A6B。 有誰知道這樣的庫/工具是否已經存在?

如果不是,那么制作一個優化的方法是什么? JavaScript 的這個答案是我能找到的最接近我應該能夠適應的東西,但不幸的是我無法理解它的實際工作原理,這使得適應它更加棘手。 另外,無論如何,在 PHP 中可能有更好的方法。 關於如何最好地分解任務的一些邏輯指針將不勝感激。

編輯:由於顯然不清楚這在實踐中會如何,我正在尋找允許這種類型輸入的東西:

$possibleMatches = parseRegexPattern('[ct]hun(k|der)(s|ed|ing)?');

– 然后打印$possibleMatches應該給出這樣的結果(在我的情況下,元素的順序並不重要):

Array
(
    [0] => chunk
    [1] => thunk
    [2] => chunks
    [3] => thunks
    [4] => chunked
    [5] => thunked
    [6] => chunking
    [7] => thunking
    [8] => chunder
    [9] => thunder
    [10] => chunders
    [11] => thunders
    [12] => chundered
    [13] => thundered
    [14] => chundering
    [15] => thundering
)

方法

  1. 您需要去除可變模式 你可以使用preg_match_all來做到這一點

    preg_match_all("/(\[\w+\]|\([\w|]+\))/", '[ct]hun(k|der)(s|ed|ing)?', $matches); /* Regex: /(\[\w+\]|\([\w|]+\))/ /: Pattern delimiter (: Start of capture group \[\w+\]: Character class pattern |: OR operator \([\w|]+\): Capture group pattern ): End of capture group /: Pattern delimiter */
  2. 然后,您可以將捕獲組擴展為字母或單詞(取決於類型)

     $array = str_split($cleanString, 1); // For a character class $array = explode("|", $cleanString); // For a capture group
  3. 以遞歸方式遍歷每個$array

代碼

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = '[ct]hun(k|der)(s|ed|ing)?';
$matchPattern = "/(\[\w+\]|\([\w|]+\))\??/";

preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

附加功能

擴展嵌套組

在使用中,您可以將它放在“preg_match_all”之前。

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';

echo preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex), PHP_EOL;

Output:

This happen(s|ed) to (become|be|have|having) test case 1?

匹配單個字母

這樣做的重點是更新正則表達式:

$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

並將else添加到prepOptions function:

} else {
    $array = [$cleanString];
}

完整的工作示例

function printMatches($pattern, $array, $matchPattern)
{
    $currentArray = array_shift($array);

    foreach ($currentArray as $option) {
        $patternModified = preg_replace($matchPattern, $option, $pattern, 1);
        if (!count($array)) {
            echo $patternModified, PHP_EOL;
        } else {
            printMatches($patternModified, $array, $matchPattern);
        }
    }
}

function prepOptions($matches)
{
    foreach ($matches as $match) {
        $cleanString = preg_replace("/[\[\]\(\)\?]/", "", $match);
        
        if ($match[0] === "[") {
            $array = str_split($cleanString, 1);
        } elseif ($match[0] === "(") {
            $array = explode("|", $cleanString);
        } else {
            $array = [$cleanString];
        }
        if ($match[-1] === "?") {
            $array[] = "";
        }
        $possibilites[] = $array;
    }
    return $possibilites;
}

$regex        = 'This happen(s|ed) to (be(come)?|hav(e|ing)) test case 1?';
$matchPattern = "/(?:(\[\w+\]|\([\w|]+\))\??|(\w\?))/";

$regex = preg_replace_callback("/(\(|\|)(\w+)(?:\(([\w\|]+)\)\??)/", function($array){
    $output = explode("|", $array[3]);
    if ($array[0][-1] === "?") {
        $output[] = "";
    }
    foreach ($output as &$option) {
        $option = $array[2] . $option;
    }
    return $array[1] . implode("|", $output);
}, $regex);


preg_match_all($matchPattern, $regex, $matches);

printMatches(
    $regex,
    prepOptions($matches[0]),
    $matchPattern
);

Output:

This happens to become test case 1
This happens to become test case 
This happens to be test case 1
This happens to be test case 
This happens to have test case 1
This happens to have test case 
This happens to having test case 1
This happens to having test case 
This happened to become test case 1
This happened to become test case 
This happened to be test case 1
This happened to be test case 
This happened to have test case 1
This happened to have test case 
This happened to having test case 1
This happened to having test case 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM