PHP preg_match_all限制

Question

我正在使用preg_match_all非常長的模式。

運行代碼時，我收到此錯誤：

警告：preg_match_all（）：編譯失敗：正則表達式在偏移量707830處太大

搜索之后，我得到了解決方案，所以我應該在php.ini增加pcre.backtrack_limit和pcre.recursion_limit值

但是在我增加值並重啟我的apache之后，它仍然遇到了同樣的問題。 我的PHP版本是5.3.8

Answer 1

這個錯誤與正則表達式的性能無關，它與正則表達式本身有關。 更改pcre.backtrack_limit和pcre.recursion_limit不會產生任何影響，因為正則表達式永遠不會有機會運行。 問題是正則表達式太大了，解決方案是讓正則表達式更小 - 更小，更小。

Answer 2

增加PCRE回溯和遞歸限制可能會解決問題，但是當數據大小達到新限制時仍會失敗。 （隨着更多數據不能很好地擴展）

例：

<?php 
// essential for huge PCREs
ini_set("pcre.backtrack_limit", "23001337");
ini_set("pcre.recursion_limit", "23001337");
// imagine your PCRE here...
?>

要真正解決底層問題，必須優化表達式並（如果可能）將復雜表達式拆分為“部分”並將一些邏輯移到PHP。 我希望你通過閱讀這個例子得到這個想法..而不是試圖用一個PCRE直接找到子結構，我展示了一種更加“迭代”的方法，使用PHP更深入地進入結構。 例：

<?php
$html = file_get_contents("huge_input.html");

// first find all tables, and work on those later
$res = preg_match_all("!<table.*>(?P<content>.*)</table>!isU", $html, $table_matches);

if ($res) foreach($table_matches['content'] as $table_match) {  

    // now find all cells in each table that was found earlier ..
    $res = preg_match_all("!<td.*>(?P<content>.*)</td>!isU", $table_match, $cell_matches);

    if ($res) foreach($cell_matches['content'] as $cell_match) {

        // imagine going deeper and deeper into the structure here...
        echo "found a table cell! content: ", $cell_match;

    }    
}

Answer 3

我正在寫這個答案，因為我在同一個問題上做了標記。 正如Alan Moore指出的那樣，調整回溯和遞歸限制無助於解決問題。

當針超過最大可能的針尺寸時會發生所述錯誤，該針尺寸受到下面的pcre庫的限制。 描述的錯誤不是由php引起的，而是由底層的pcre庫引起的。 這是在這里定義的錯誤消息＃20：

https://github.com/php/.../pcre_compile.c#L477

php只是在失敗時打印從pcre庫收到的errortext。

但是，當我嘗試使用先前捕獲的片段作為針並且它們大於32k字節時，在我的環境中出現此錯誤。

它可以通過使用php的cli中的這個簡單腳本輕松測試

<?php
// This script demonstrates the above error and dumps an info
// when the needle is too long or with 64k iterations.

$expand=$needle="_^b_";
while( ! preg_match( $needle, "Stack Exchange Demo Text" ) )
{
    // Die after 64 kbytes of accumulated chunk needle
    // Adjust to 32k for a better illustration
    if ( strlen($expand) > 1024*64 ) die();

    if ( $expand == "_^b_" ) $expand = "";
    $expand .= "a";
    $needle = '_^'.$needle.'_ism';

    echo strlen($needle)."\n";

}
?>

要修復錯誤，必須減少生成的針頭 - 或者 - 如果需要捕獲所有內容 - 必須使用帶有附加偏移參數的多個preg_match。

<?php
    if ( 
        preg_match( 
            '/'.preg_quote( 
                    substr( $big_chunk, 0, 20*1024 ) // 1st 20k chars
                ) 
                .'.*?'. 
                preg_quote( 
                    substr( $big_chunk, -5 ) // last 5
                ) 
            .'/', 
            $subject 
        ) 
    ) { 
        // do stuff
    }

    // The match all needles in text attempt
    if ( preg_match( 
            $needle_of_1st_32kbytes_chunk, 
            $subj, $matches, $flags = 0, 
            $offset = 32*1024*0 // Offset -> 0
        )
        && preg_match( 
            $needle_of_2nd_32kbytes_chunk, 
            $subj, $matches, $flags = 0, 
            $offset = 32*1024*1 // Offset -> 32k
        )
        // && ... as many preg matches as needed
    ) {
        // do stuff
    }

    // it would be nicer to put the texts in a foreach-loop iterating
    // over the existings chunks 
?>

你明白了。

雖然這個答案有點laaaaate，我希望它仍然可以幫助那些遇到這個問題而沒有很好解釋錯誤的人。

PHP preg_match_all限制

問題描述

3 個解決方案

解決方案1
12 2011-11-25 12:41:40

解決方案2
7 已采納 2011-11-25 12:36:42

解決方案3
3 2016-02-24 14:14:53

PHP preg_match_all限制

問題描述

3 個解決方案

解決方案1 12 2011-11-25 12:41:40

解決方案2 7 已采納 2011-11-25 12:36:42

解決方案3 3 2016-02-24 14:14:53

解決方案1
12 2011-11-25 12:41:40

解決方案2
7 已采納 2011-11-25 12:36:42

解決方案3
3 2016-02-24 14:14:53