如何使用正则表达式评估约束？（PHP，正则表达式）

Question

所以，假设我想接受如下字符串
SomeColumn IN||<||>||= [123, 'hello', "wassup"]||123||'hello'||"yay!"
例如： MyValue IN ['value', 123]或MyInt > 123 -> 我想你明白了。 现在，困扰我的是如何在正则表达式中表达这个？ 我正在使用 PHP，这就是我现在正在做的事情：

        $temp = explode(';', $constraints);
        $matches = array();
        foreach ($temp as $condition) {
            preg_match('/(.+)[\t| ]+(IN|<|=|>|!)[\t| ]+([0-9]+|[.+]|.+)/', $condition, $matches[]);
        }
        foreach ($matches as $match) {
            if ($match[2] == 'IN') {
                preg_match('/(?:([0-9]+|".+"|\'.+\'))/', substr($match[3], 1, -1), $tempm);
                print_r($tempm);
            }
        }

真的很感谢那里的任何帮助，我的正则表达式很糟糕。

Answer 1

我假设您的输入看起来与此类似：

$string = 'SomeColumn IN [123, \'hello\', "wassup"];SomeColumn < 123;SomeColumn = \'hello\';SomeColumn > 123;SomeColumn = "yay!";SomeColumn = [123, \'hello\', "wassup"]';

如果您使用preg_match_all则不需要explode或自己构建匹配。 请注意，生成的二维数组将切换维度，但这通常是可取的。 这是代码：

preg_match_all('/(\w+)[\t ]+(IN|<|>|=|!)[\t ]+((\'[^\']*\'|"[^"]*"|\d+)|\[[\t ]*(?4)(?:[\t ]*,[\t ]*(?4))*[\t ]*\])/', $string, $matches);

$statements = $matches[0];
$columns = $matches[1];
$operators = $matches[2];
$values = $matches[3];

也会有一个$matches[4]但它没有真正的意义，只在正则表达式中使用。 首先，您在尝试中做错了一些事情：

(.+)会消耗尽可能多的，任何字符。 因此，如果您在字符串值中有一些看起来像IN 13那么您的第一次重复可能会消耗所有内容，并将其作为列返回。 它还允许在列名中使用空格和= 。 有两种方法可以解决这个问题。 要么通过附加使重复“不贪婪” ? 或者，更好的是，限制允许的字符，这样您就不能超过所需的分隔符。 在我的正则表达式中，我只允许使用字母、数字和下划线 ( \\w ) 作为列标识符。
[\\t| ] [\\t| ]这混淆了两个概念：交替和字符类。 它的作用是“匹配制表符、管道或空格”。 在字符类中，您只需编写所有字符而无需对其进行分隔。 或者，您可以编写(\\t| ) ，这在这种情况下是等效的。
[.+]我不知道你想用这个来完成什么，但它匹配一个文字. 或文字+ 。 再次限制允许的字符并检查引号的正确匹配可能很有用（以避免'some string" ）

现在解释一下我自己的正则表达式（您也可以将其复制到您的代码中，它会正常工作；另外，您在代码中将解释作为注释）：

preg_match_all('/
    (\w+)           # match an identifier and capture in $1
    [\t ]+          # one or more tabs or spaces
    (IN|<|>|=|!)    # the operator (capture in $2)
    [\t ]+          # one or more tabs or spaces
    (               # start of capturing group $3 (the value)
        (           # start of subpattern for single-valued literals (capturing group $4)
            \'      # literal quote
            [^\']*  # arbitrarily many non-quote characters, to avoid going past the end of the string
            \'      # literal quote
        |           # OR
            "[^"]*" # equivalent for double-quotes
        |           # OR
            \d+     # a number
        )           # end of subpattern for single-valued literals
    |               # OR (arrays follow)
        \[          # literal [
        [\t ]*      # zero or more tabs or spaces
        (?4)        # reuse subpattern no. 4 (any single-valued literal)
        (?:         # start non-capturing subpattern for further array elements
            [\t ]*  # zero or more tabs or spaces
            ,       # a literal comma
            [\t ]*  # zero or more tabs or spaces
            (?4)    # reuse subpattern no. 4 (any single-valued literal)
        )*          # end of additional array element; repeat zero or more times
        [\t ]*      # zero or more tabs or spaces
        \]          # literal ]
    )               # end of capturing group $3
    /',
    $string,
    $matches);

这利用了 PCRE 的递归功能，您可以在其中使用(?n)重用子模式（或整个正则表达式(?n) （其中n只是您也将用于反向引用的数字）。

我可以想到可以用这个正则表达式改进的三个主要方面：

它不允许浮点数
它不允许转义引号（如果您的值是'don\\'t do this' ，我只会捕获'don\\' ）。 这可以使用否定的lookbehind来解决。
它不允许将空数组作为值（这可以通过将所有参数包装在一个子模式中并使用?使其可选来轻松解决）

我没有包括这些，因为我不确定它们是否适用于您的问题，而且我认为正则表达式已经足够复杂，可以在这里展示。

通常正则表达式的功能不足以进行正确的语言解析。 通常最好编写解析器。

既然你说你的正则表达式很糟糕......虽然正则表达式由于其不常见的语法而看起来像是很多黑魔法，但它们并不难理解，如果你花点时间了解一下它们的基本概念。 我可以推荐这个教程。 它真的带你一路走来！

如何使用正则表达式评估约束？（PHP，正则表达式）

问题描述

1 个解决方案

解决方案1
0 已采纳 2012-11-13 22:11:15

如何使用正则表达式评估约束？ （PHP，正则表达式）

问题描述

1 个解决方案

解决方案1 0 已采纳 2012-11-13 22:11:15

如何使用正则表达式评估约束？（PHP，正则表达式）

解决方案1
0 已采纳 2012-11-13 22:11:15