如何使用正則表達式評估約束？（PHP，正則表達式）

Question

所以，假設我想接受如下字符串
SomeColumn IN||<||>||= [123, 'hello', "wassup"]||123||'hello'||"yay!"
例如： MyValue IN ['value', 123]或MyInt > 123 -> 我想你明白了。 現在，困擾我的是如何在正則表達式中表達這個？ 我正在使用 PHP，這就是我現在正在做的事情：

        $temp = explode(';', $constraints);
        $matches = array();
        foreach ($temp as $condition) {
            preg_match('/(.+)[\t| ]+(IN|<|=|>|!)[\t| ]+([0-9]+|[.+]|.+)/', $condition, $matches[]);
        }
        foreach ($matches as $match) {
            if ($match[2] == 'IN') {
                preg_match('/(?:([0-9]+|".+"|\'.+\'))/', substr($match[3], 1, -1), $tempm);
                print_r($tempm);
            }
        }

真的很感謝那里的任何幫助，我的正則表達式很糟糕。

Answer 1

我假設您的輸入看起來與此類似：

$string = 'SomeColumn IN [123, \'hello\', "wassup"];SomeColumn < 123;SomeColumn = \'hello\';SomeColumn > 123;SomeColumn = "yay!";SomeColumn = [123, \'hello\', "wassup"]';

如果您使用preg_match_all則不需要explode或自己構建匹配。 請注意，生成的二維數組將切換維度，但這通常是可取的。 這是代碼：

preg_match_all('/(\w+)[\t ]+(IN|<|>|=|!)[\t ]+((\'[^\']*\'|"[^"]*"|\d+)|\[[\t ]*(?4)(?:[\t ]*,[\t ]*(?4))*[\t ]*\])/', $string, $matches);

$statements = $matches[0];
$columns = $matches[1];
$operators = $matches[2];
$values = $matches[3];

也會有一個$matches[4]但它沒有真正的意義，只在正則表達式中使用。 首先，您在嘗試中做錯了一些事情：

(.+)會消耗盡可能多的，任何字符。 因此，如果您在字符串值中有一些看起來像IN 13那么您的第一次重復可能會消耗所有內容，並將其作為列返回。 它還允許在列名中使用空格和= 。 有兩種方法可以解決這個問題。 要么通過附加使重復“不貪婪” ? 或者，更好的是，限制允許的字符，這樣您就不能超過所需的分隔符。 在我的正則表達式中，我只允許使用字母、數字和下划線 ( \\w ) 作為列標識符。
[\\t| ] [\\t| ]這混淆了兩個概念：交替和字符類。 它的作用是“匹配制表符、管道或空格”。 在字符類中，您只需編寫所有字符而無需對其進行分隔。 或者，您可以編寫(\\t| ) ，這在這種情況下是等效的。
[.+]我不知道你想用這個來完成什么，但它匹配一個文字. 或文字+ 。 再次限制允許的字符並檢查引號的正確匹配可能很有用（以避免'some string" ）

現在解釋一下我自己的正則表達式（您也可以將其復制到您的代碼中，它會正常工作；另外，您在代碼中將解釋作為注釋）：

preg_match_all('/
    (\w+)           # match an identifier and capture in $1
    [\t ]+          # one or more tabs or spaces
    (IN|<|>|=|!)    # the operator (capture in $2)
    [\t ]+          # one or more tabs or spaces
    (               # start of capturing group $3 (the value)
        (           # start of subpattern for single-valued literals (capturing group $4)
            \'      # literal quote
            [^\']*  # arbitrarily many non-quote characters, to avoid going past the end of the string
            \'      # literal quote
        |           # OR
            "[^"]*" # equivalent for double-quotes
        |           # OR
            \d+     # a number
        )           # end of subpattern for single-valued literals
    |               # OR (arrays follow)
        \[          # literal [
        [\t ]*      # zero or more tabs or spaces
        (?4)        # reuse subpattern no. 4 (any single-valued literal)
        (?:         # start non-capturing subpattern for further array elements
            [\t ]*  # zero or more tabs or spaces
            ,       # a literal comma
            [\t ]*  # zero or more tabs or spaces
            (?4)    # reuse subpattern no. 4 (any single-valued literal)
        )*          # end of additional array element; repeat zero or more times
        [\t ]*      # zero or more tabs or spaces
        \]          # literal ]
    )               # end of capturing group $3
    /',
    $string,
    $matches);

這利用了 PCRE 的遞歸功能，您可以在其中使用(?n)重用子模式（或整個正則表達式(?n) （其中n只是您也將用於反向引用的數字）。

我可以想到可以用這個正則表達式改進的三個主要方面：

它不允許浮點數
它不允許轉義引號（如果您的值是'don\\'t do this' ，我只會捕獲'don\\' ）。 這可以使用否定的lookbehind來解決。
它不允許將空數組作為值（這可以通過將所有參數包裝在一個子模式中並使用?使其可選來輕松解決）

我沒有包括這些，因為我不確定它們是否適用於您的問題，而且我認為正則表達式已經足夠復雜，可以在這里展示。

通常正則表達式的功能不足以進行正確的語言解析。 通常最好編寫解析器。

既然你說你的正則表達式很糟糕......雖然正則表達式由於其不常見的語法而看起來像是很多黑魔法，但它們並不難理解，如果你花點時間了解一下它們的基本概念。 我可以推薦這個教程。 它真的帶你一路走來！

如何使用正則表達式評估約束？（PHP，正則表達式）

問題描述

1 個解決方案

解決方案1
0 已采納 2012-11-13 22:11:15

如何使用正則表達式評估約束？ （PHP，正則表達式）

問題描述

1 個解決方案

解決方案1 0 已采納 2012-11-13 22:11:15

如何使用正則表達式評估約束？（PHP，正則表達式）

解決方案1
0 已采納 2012-11-13 22:11:15