简体   繁体   English

我不明白这个纺织正则表达

[英]I don't understand this Textile Regex

I found the following regex in the PHP code of the Textism Textile: 我在Textism Textile的PHP代码中找到了以下正则表达式:

/\b ?[([]TM[])]/i

I consider myself to be experienced in reading regexes but this one is a mystery to me. 我认为自己在阅读正则表达方面经验丰富,但这对我来说是一个谜。 The beginning is easy, but I don't understand why there are two empty character class inside of an already opened character class [[][]] ? 开始很简单,但我不明白为什么在已打开的字符类[[][]]有两个空字符类?

Can someone shed some light on this issue? 有人能解释一下这个问题吗?

It is a rather cryptic one... 这是一个相当神秘的......

Here's what it means: 这就是它的含义:

/     # start regex pattern
\b    # word boundary
 ?    # an optional space
[([]  # char class: either '(' or '['
TM    # literal 'TM'
[])]  # char class: either ']' or ')'
/     # end regex pattern
i     # match case insensitive

Some things to note: 有些事情需要注意:

  • inside a character class, [ is not special and need not be escaped ( [([] is therefor valid!) 在一个字符类中, [不是特殊的,不需要转义( [([]因此有效!)
  • inside a character class, the first character, possibly a special char, need not be escaped ( [])] is therefor valid: ] needs no escape!) 在一个字符类中,第一个字符,可能是一个特殊的字符,不需要转义( [])]因此有效: ]不需要转义!)

To summarize, it matches "TM" case insensitive surrounded by either [ or ( and ] or ) (they do not need to be matched: "[TM)" will be matched in most cases). 总而言之,它匹配由[(])包围的"TM"不区分大小写(它们不需要匹配: "[TM)"在大多数情况下将匹配)。 I say in most cases, because \\b ? 我说在大多数情况下,因为\\b ? will cause "[tm)" to be excluded from the matches in the demo below because it is preceded by ". " which does not match \\b ? 将导致"[tm)"从下面的演示中的匹配中排除,因为它前面是". " ,它与\\b ?不匹配\\b ? :

<?php
preg_match_all(
    '/\b ?[([]TM[])]/i', 
    "... [tm) foo (TM) bar [TM] baz (tm] ...", 
    $matches
);
print_r($matches);
?>
/*
Array
(
    [0] => Array
        (
            [0] =>  (TM)
            [1] =>  [TM]
            [2] =>  (tm]
        )

)
*/

EDIT: ] seems to be allowed as the first character of a character class if the regular expression follows the POSIX flavor of regular expressions. 如果正则表达式遵循正则表达式的POSIX风格,则编辑: ]似乎被允许作为字符类的第一个字符。 See http://www.regular-expressions.info/posixbrackets.html . http://www.regular-expressions.info/posixbrackets.html In PHP, the eregs_ functions use POSIX while the preg_ functions use the newer PCRE flavor which does not allow this construct. 在PHP中, eregs_函数使用POSIX,而preg_函数使用不允许此构造的较新的PCRE风格。

So, provided POSIX flavor: 所以,提供POSIX风味:

[([]

is one character class consisting of ( and [ and 是一个由(和[和]组成的字符类

[])] 

is another one consisting of ] and ). 是另一个由]和)组成的。 Most regexp engines would require the second character class to be written 大多数regexp引擎都需要编写第二个字符类

[\])]

instead. 代替。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM