[英]php regex: Use quotes for match, but don't capture them
I'm unsure if I should be using preg_match, preg_match_all, or preg_split with delim capture. 我不确定我是否应该使用preg_match,preg_match_all或preg_split与delim捕获。 I'm also unsure of the correct regex.
我也不确定正确的正则表达式。
Given the following: 鉴于以下内容:
$string = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";
I want to get an array with the following elems: 我想得到一个包含以下元素的数组:
[0] = "ok"
[1] = "that\'s"
[2] = "yeah that's \"cool\""
You can not do this with a regular expression because you're trying to parse a non-context-free grammar . 您不能使用正则表达式执行此操作,因为您正在尝试解析非上下文无关的语法 。 Write a parser.
写一个解析器。
Outline: 大纲:
\\
remember it. \\
请逐字逐句阅读。 "
or '
check if the previous character was \\
. You now have your delimiting condition. "
或'
检查前一个字符是否为\\
。您现在有了分隔条件。 Your desired result set seems to trim spaces, you also lost a couple of the \\
s, perhaps this is a mistake but it can be important. 你想要的结果集似乎修剪空间,你也失去了几个
\\
s,也许这是一个错误,但它可能很重要。
I would expect: 我希望:
[0] = " ok " // <-- spaces here
[1] = "that\\'s cool"
[2] = " \"yeah that's \\\"cool\\\"\"" // leading space here, and \" remains
Actually, you might be surprised to find that you can do this in regex: 实际上,您可能会惊讶地发现您可以在正则表达式中执行此操作:
preg_match_all("((?|\"((?:\\\\.|[^\"])+)\"|'((?:\\\\.|[^'])+)'|(\w+)))",$string,$m);
The desired result array will be in $m[1]
. 期望的结果数组将是
$m[1]
。
You can do it with a regex: 你可以使用正则表达式:
$pattern = <<<'LOD'
~
(?J)
# Definitions #
(?(DEFINE)
(?<ens> (?> \\{2} )+ ) # even number of backslashes
(?<sqc> (?> [^\s'\\]++ | \s++ (?!'|$) | \g<ens> | \\ '?+ )+ ) # single quotes content
(?<dqc> (?> [^\s"\\]++ | \s++ (?!"|$) | \g<ens> | \\ "?+ )+ ) # double quotes content
(?<con> (?> [^\s"'\\]++ | \s++ (?!["']|$) | \g<ens> | \\ ["']?+ )+ ) # content
)
# Pattern #
\s*+ (?<res> \g<con>)
| ' \s*+ (?<res> \g<sqc>) \s*+ '?+
| " \s*+ (?<res> \g<dqc>) \s*+ "?+
~x
LOD;
$subject = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
var_dump($match['res']);
}
I made the choice to trim spaces in all results, then " abcd "
will give abcd
. 我选择在所有结果中修剪空格,然后
" abcd "
将给出abcd
。 This pattern allows all backslashes you want, anywhere you want. 此模式允许您想要的所有反斜杠。 If a quoted string is not closed at the end of the string, the end of the string is considered as the closing quote (this is why i have made the closing quotes optional) .
如果引用的字符串未在字符串的末尾处关闭,则字符串的结尾将被视为结束引号(这就是为什么我使结束引号可选) 。 So,
abcd " ef'gh
will give you abcd
and ef'gh
所以,
abcd " ef'gh
会给你abcd
和ef'gh
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.