简体   繁体   English

在PHP中更正非法的PCRE正则表达式

[英]Correcting an illegal PCRE regex in PHP

Update 5/26 更新5/26

I've fixed the behavior of the regular expressions that were previously contained in this question, but as others have mentioned, my syntax still wasn't correct. 我已经修复了此问题之前包含的正则表达式的行为,但正如其他人所提到的,我的语法仍然不正确。 Apparently the fact that it compiles is due to PHP's preg_* family of functions overlooking my mistakes. 显然它编译的事实是由于PHP的preg_*系列函数忽略了我的错误。

I'm definitely a PCRE novice so I'm trying to understand what mistakes are present so that I can go about fixing them. 我绝对是一名PCRE新手,所以我试图了解哪些错误存在,以便我可以去解决它们。 I'm also open to critique about design/approach, and as others have mentioned, I am also going to build in compatibility with JSON and YAML, but I'd like to go ahead and finish this home-brewed parser since I have it working and I just need to work on the expression syntax (I think). 我也对设计/方法的批评持开放态度,正如其他人所提到的,我也将建立与JSON和YAML的兼容性,但是我想继续完成这个自制的解析器,因为我有它工作,我只需要处理表达式语法(我认为)。

Here are all of the preg_match_all references and the one preg_replace reference extracted from the whole page of code: 以下是从整个代码页中提取的所有preg_match_all引用和一个preg_replace引用:

// matches the outside container of objects {: and :}
$regex = preg_match_all('/\s\{:([^\}]+):\}/i', $this->html, $HTMLObjects);

// double checks that the object container is removed
$markup = preg_replace('/[\{:]([^\}]+):\}/i', '$1', $markup);

// matches all dynamic attributes (those containing bracketed data)
$dynamicRegEx = preg_match_all('/[\n]+([a-z0-9_\-\s]+)\[([^\]]+)\]/', $markup, $dynamicMatches);

// matches all static attributes (simple colon-separated attributes)
$staticRegEx = preg_match_all('/([^:]+):([^\n]+)/', $staticMarkup, $staticMatches);

If you'd like to see the preg_match_all and preg_replace references in context so that you can comment/critique that as well, you can see the containing source file by following the link below. 如果您希望在上下文中看到preg_match_allpreg_replace引用,以便您也可以对其进行注释/批评,则可以通过以下链接查看包含源文件。

Note: viewing the source code of the page makes everything much more readable http://mdl.fm/codeshare.php?htmlobject 注意:查看页面的源代码会使所有内容更具可读性http://mdl.fm/codeshare.php?htmlobject

Like I said, I have it functioning as it stands, I'm just asking for supervision on my PCRE syntax so that it isn't illegal. 就像我说的那样,我让它按原样运行,我只是要求监督我的PCRE语法,这样就不违法了。 However, if you have comments on the structure/design or anything else I'm open to all suggestions. 但是,如果您对结构/设计有任何意见或其他任何我对所有建议持开放态度。

(Rewritten to reflect new question) (改写以反映新问题)

The first regex is correct, but you don't need to escape } within a character class. 第一个正则表达式是正确的,但你不需要在一个字符类中转义} Also, I usually include both braces to avoid the matching of nested objects (your regex would match {:foo {:bar:} in the string "{:foo {:bar:} baz:}" ), mine would only match {:bar:} . 此外,我通常包括括号,以避免嵌套对象的匹配(你的正则表达式将匹配{:foo {:bar:}在字符串中"{:foo {:bar:} baz:}" ),该矿将只匹配{:bar:} The /i mode modifier is useless since there is no cased text in your regex. /i模式修饰符没用,因为正则表达式中没有套接文本。

// matches the outside container of objects {: and :}
$regex = preg_match_all('/\s\{:([^{}]+):\}/', $this->html, $HTMLObjects);

In your second regex, there is an incorrect character class at the start that needs to be removed. 在第二个正则表达式中,一开始就有一个不正确的字符类需要删除。 Otherwise, it's the same. 否则,它是一样的。

// double checks that the object container is removed
$markup = preg_replace('/\{:([^{}]+):\}/', '$1', $markup);

Your third regex looks OK; 你的第三个正则表达式看起来不错 there's another useless character class, though. 但是,还有另一个无用的角色类。 Again, I've included both brackets in the negated character class. 同样,我在被否定的字符类中包含了两个括号。 I'm not sure why you've made it case-sensitive - shouldn't there be an /i modifier here? 我不确定你为什么让它区分大小写 - 这里不应该有/i修饰符吗?

// matches all dynamic attributes (those containing bracketed data)
$dynamicRegEx = preg_match_all('/\n+([a-z0-9_\-\s]+)\[([^\[\]]+)\]/i', $markup, $dynamicMatches);

The last regex is OK, but it will always match from the very first character of the string until the first colon (and then on to the rest of the line). 最后一个正则表达式是正常的,但它始终匹配从字符串的第一个字符到第一个冒号(然后再到该行的其余部分)。 I think I would add a newline character to the first negated character class to make sure that can't happen: 我想我会在第一个否定的字符类中添加换行符,以确保不会发生:

// matches all static attributes (simple colon-separated attributes)
$staticRegEx = preg_match_all('/([^\n:]+):([^\n]+)/', $staticMarkup, $staticMatches);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM