简体   繁体   English

PHP 和 MariaDB 之间的 PCRE RegEx 差异

[英]PCRE RegEx difference between PHP and MariaDB

I have a RegEx that does not match using PHP (nor regex101.com) but does with MariaDB.我有一个使用 PHP(也不是 regex101.com)但与 MariaDB 不匹配的正则表达式。 Its purpose is to search for HTML classes in XML values (the HTML is encoded).其目的是在 XML 值中搜索 HTML 类(编码 HTML)。

Here is an example XML value where you can see a <ul> element with a liste--non-ordonnee--gros--exergue CSS class:这是一个示例 XML 值,您可以在其中看到具有liste--non-ordonnee--gros--exergue CSS ZA2F2ED4F8EBC2CBB4C21A29DC40AB61D 的<ul>元素:

&lt;ul class=&quot;liste--non-ordonnee--gros--exergue&quot;&gt;

I want the RegEx to match only full classes.我希望 RegEx 只匹配完整的课程。 Therefore if I search --exergue I don't want it to match.因此,如果我搜索--exergue我不希望它匹配。 Using PHP or other PCRE/PCRE2 online tester it does not match:使用 PHP 或其他 PCRE/PCRE2 在线测试仪不匹配:

~(class=&quot;(?:[^&]*\s)?)--exergue~sU

But using MariaDB (v10.2.40 - PCRE 8.42), it matches:但是使用 MariaDB (v10.2.40 - PCRE 8.42),它匹配:

(?sU)(class=&quot;(?:[^&]*\s)?)--exergue

It looks for a class attribute containing the class to replace.它查找包含要替换的 class 的class属性。 I tried to change the class name to something else for demonstration purposes (searching --suffix in class-with--suffix ) but then it would not match anymore on the MariaDB version.为了演示目的,我尝试将 class 名称更改为其他名称(在class-with--suffix中搜索--suffix ),但它在 MariaDB 版本上不再匹配。

What is wrong with my RegEx or its MariaDB version?我的 RegEx 或其 MariaDB 版本有什么问题?

I am aware that regular expressions should not be used with HTML and am open to alternatives but this is TYPO3: storing encoded HTML into XML values in a db column. I am aware that regular expressions should not be used with HTML and am open to alternatives but this is TYPO3: storing encoded HTML into XML values in a db column. Design changes require massive class renaming.设计更改需要大量 class 重命名。

This matches with PHP preg 8.0.x, within 169 steps (there still might be room for improvement):这与 PHP preg 8.0.x 匹配,在 169 步内(仍有改进的余地):

.*\sclass=&quot;(?:.*)?&quot;.*

The XML would have been relevant to match more accurately, compared to a single occurrence.与单次匹配相比,XML 会更准确地匹配。 This means, you've removed the sample data and therefore may have misrepresented the problem.这意味着,您已经删除了示例数据,因此可能歪曲了问题。

First, the very short MCVE of your case:首先,您的案例非常短的 MCVE:

SELECT 'class=&quot;s--e' REGEXP '(?sU)(class=&quot;(?:[^&]*\s)?)--e'

MariaDB matches, PHP doesn't match ( demo ). MariaDB 匹配,PHP 不匹配(演示)。 Why?为什么? As in MariaDB's manual for REGEXP :在 MariaDB 的REGEXP手册中:

Note: Because MariaDB uses the C escape syntax in strings (for example, "\n" to represent the newline character), you must double any " \ " that you use in your REGEXP strings.注意:由于 MariaDB 在字符串中使用 C 转义语法(例如,“\n”表示换行符),因此您必须将您在 REGEXP 字符串中使用的任何“ \ ”加倍。

Your issue is caused by the \s that should be double-escaped as \\s in your MariaDB query.您的问题是由应该在 MariaDB 查询中双重转义为\\s\s引起的。 Once you fix that, the PHP and SQL regex statements become equivalent and behave the same.一旦你解决了这个问题,PHP 和 SQL 正则表达式语句变得等效并且行为相同。

This won't match (with --e not led by space):将不匹配(与--e不以空格为首):

SELECT 'class=&quot;s--e' REGEXP '(?sU)(class=&quot;(?:[^&]*\\s)?)--e'; 

This will match (with --e led by space):匹配( --e由空格引导):

SELECT 'class=&quot;s --e' REGEXP '(?sU)(class=&quot;(?:[^&]*\\s)?)--e';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM