简体   繁体   English

用 PCRE 提取“表达式语法”

[英]Extracting "expression syntax" with PCRE

We're working on a project where we support extrapolation expressions from strings.我们正在开发一个支持字符串外插表达式的项目。 Under the hood we're using Symfony's Expression Language to provide context parsing, but we're the ones extracting expressions from strings.在后台,我们使用 Symfony 的表达式语言来提供上下文解析,但我们是从字符串中提取表达式的人。

I would like to preface this by saying, I am no expert at regular expressions.我想先说一下,我不是正则表达式的专家。 My working knowledge is limited, and so the following regex will appear clunky and inelegant:我的工作知识有限,因此以下正则表达式会显得笨拙和不雅:

/\${(.*?)}(?=[\s\w\-_\/\\:;,.?!()|"\]&]|$)/

The theory is thus:理论是这样的:

  1. An expression starts with ${ .表达式以${开头。 This is the starting anchor.这是起始锚。
  2. Match anything in there.匹配那里的任何东西。
  3. The expression ends with a closing } that is followed by either a line end $ , or one of the items from the character list.表达式以结束}结尾,后跟行结束$ ,或字符列表中的一项。

Consider an expression that looks like this:考虑一个看起来像这样的表达式:

His name is " ${name} ", and he's a " ${thing} ".他的名字是“ ${name} ”,他是一个“ ${thing} ”。

The regex will successfully match the expressions name and thing , and will replace those with values from a value object.正则表达式将成功匹配表达式namething ,并将用值 object 中的值替换它们。

However, if we take into account that users can also parse actual expressions and values, given this:但是,如果我们考虑到用户也可以解析实际的表达式和值,那么:

${{"name": "Pack Rat", "mana_cost": "{1}{B}", "cmc": 2}}

Meaning, evaluate that expression to a JSON object, the regex fails because it stops at the }" sequence in the part {1}{B} , and matching only {"name": "Pack Rat", "mana_cost": "{1}{B .意思是,将该表达式评估为 JSON object,正则表达式失败,因为它在{1}{B}部分中的}"序列处停止,并且仅匹配{"name": "Pack Rat", "mana_cost": "{1}{B Removing " as a possible stopping point in the lookahead character list fixes the JSON, but then it fails to extract the two expressions from the regular sentence.删除"作为前瞻字符列表中的可能停止点修复了 JSON,但随后无法从正则句中提取两个表达式。

Would it be possible to avoid premature stopping of this expression parser?是否有可能避免这个表达式解析器过早停止? Or is this something that is beyond the scope of a single regular expression?或者这是否超出了单个正则表达式的 scope ?

You could use你可以使用

\$(\{(?:[^{}]+|(?1))+\})

And use this to further analyze.并以此来进一步分析。 See a demo on regex101.com .请参阅regex101.com 上的演示


In detail, this reads:详细地说,它是:

\$                       # "$" literally
(                        # opening bracket -> capture group 1
    \{                   # "{" literally
        (?:[^{}]+|(?1))+ # not { nor } or repeat the first group -> recursion
    \}                   # "}" 
)                        # end of group 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM