简体   繁体   English

php regex:使用引号进行匹配,但不捕获它们

[英]php regex: Use quotes for match, but don't capture them

I'm unsure if I should be using preg_match, preg_match_all, or preg_split with delim capture. 我不确定我是否应该使用preg_match,preg_match_all或preg_split与delim捕获。 I'm also unsure of the correct regex. 我也不确定正确的正则表达式。

Given the following: 鉴于以下内容:

$string = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";

I want to get an array with the following elems: 我想得到一个包含以下元素的数组:

[0] = "ok"
[1] = "that\'s"
[2] = "yeah that's \"cool\""

You can not do this with a regular expression because you're trying to parse a non-context-free grammar . 您不能使用正则表达式执行此操作,因为您正在尝试解析非上下文无关的语法 Write a parser. 写一个解析器。

Outline: 大纲:

  • read character by character, if you see a \\ remember it. 如果你看到一个\\请逐字逐句阅读。
  • if you see a " or ' check if the previous character was \\ . You now have your delimiting condition. 如果您看到"'检查前一个字符是否为\\ 。您现在有了分隔条件。
  • record all the tokens in this manner 以这种方式记录所有令牌

Your desired result set seems to trim spaces, you also lost a couple of the \\ s, perhaps this is a mistake but it can be important. 你想要的结果集似乎修剪空间,你也失去了几个\\ s,也许这是一个错误,但它可能很重要。

I would expect: 我希望:

[0] = " ok " // <-- spaces here
[1] = "that\\'s cool"
[2] = " \"yeah that's \\\"cool\\\"\"" // leading space here, and \" remains

Actually, you might be surprised to find that you can do this in regex: 实际上,您可能会惊讶地发现您可以在正则表达式中执行此操作:

preg_match_all("((?|\"((?:\\\\.|[^\"])+)\"|'((?:\\\\.|[^'])+)'|(\w+)))",$string,$m);

The desired result array will be in $m[1] . 期望的结果数组将是$m[1]

You can do it with a regex: 你可以使用正则表达式:

$pattern = <<<'LOD'
~
(?J) 

# Definitions #
(?(DEFINE)
  (?<ens> (?> \\{2} )+ ) # even number of backslashes

  (?<sqc> (?> [^\s'\\]++  | \s++ (?!'|$)    | \g<ens> | \\ '?+    )+ ) # single quotes content
  (?<dqc> (?> [^\s"\\]++  | \s++ (?!"|$)    | \g<ens> | \\ "?+    )+ ) # double quotes content
  (?<con> (?> [^\s"'\\]++ | \s++ (?!["']|$) | \g<ens> | \\ ["']?+ )+ ) # content
)
# Pattern #
    \s*+ (?<res> \g<con>)
| ' \s*+ (?<res> \g<sqc>) \s*+ '?+
| " \s*+ (?<res> \g<dqc>) \s*+ "?+ 
~x
LOD;
$subject = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
    var_dump($match['res']);
}

I made the choice to trim spaces in all results, then " abcd " will give abcd . 我选择在所有结果中修剪空格,然后" abcd "将给出abcd This pattern allows all backslashes you want, anywhere you want. 此模式允许您想要的所有反斜杠。 If a quoted string is not closed at the end of the string, the end of the string is considered as the closing quote (this is why i have made the closing quotes optional) . 如果引用的字符串未在字符串的末尾处关闭,则字符串的结尾将被视为结束引号(这就是为什么我使结束引号可选) So, abcd " ef'gh will give you abcd and ef'gh 所以, abcd " ef'gh会给你abcdef'gh

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM