简体   繁体   English

正则表达式在PHP中查找字符串模式

[英]Regular expression to find pattern in string in PHP

Suppose I have a string that looks like: 假设我有一个看起来像这样的字符串:

"lets refer to [[merp] [that entry called merp]] and maybe also to that entry called [[blue] [blue]]"

The idea here is to replace a block of [[name][some text]] with <a href="name.html">some text</a> . 这里的想法是用<a href="name.html">some text</a>替换[[name][some text]]的块。

So I'm trying to use regular expressions to find blocks that look like [[name][some text]] , but I'm having tremendous difficulty. 因此,我试图使用正则表达式来查找类似于[[name][some text]] ,但是我遇到了很大的困难。

Here's what I thought should work (in PHP): preg_match_all('/\\[\\[.*\\]\\[.*\\]/', $my_big_string, $matches) 这是我认为应该工作的(在PHP中): preg_match_all('/\\[\\[.*\\]\\[.*\\]/', $my_big_string, $matches)

But this just returns a single match, the string from '[[merp' to 'blue]]' . 但这仅返回一个匹配项,即从'[[merp''blue]]'的字符串。 How can I get it to return the two matches [[merp][that entry called merp]] and [[blue][blue]] ? 如何获得返回两个匹配项[[merp][that entry called merp]][[blue][blue]]

The regex you're looking for is \\[\\[(.+?)\\]\\s\\[(.+?)\\]\\] and replace it with <a href="$1">$2</a> 您要查找的正则表达式为\\[\\[(.+?)\\]\\s\\[(.+?)\\]\\]并替换为<a href="$1">$2</a>

The regex pattern matched inside the () braces are captured and can be back-referenced using $1, $2,... 捕获在()大括号内匹配的正则表达式模式,可以使用$ 1,$ 2,...向后引用。

Example on regex101.com regex101.com上的示例

Quantifiers like the * are by default greedy , *这样的量词默认为贪婪

which means, that as much as possible is matched to meet conditions. 这意味着,要尽可能满足条件。 Eg in your sample a regex like \\[.*\\] would match everything from the first [ to the last ] in the string. 例如,在您的示例中,像\\[.*\\]这样的正则表达式将匹配字符串中从第一个[到最后一个]的所有内容。 To change the default behaviour and make quantifiers lazy ( ungreedy, reluctant ): 要更改默认行为,并使量词变得懒惰 (不贪心,不情愿 ):

  • Use the U (PCRE_UNGREEDY) modifier to make all quantifiers lazy 使用U (PCRE_UNGREEDY) 修饰符使所有量词变得懒惰
  • Put a ? 放一个? after a specific quantifier. 在特定的量词之后。 Eg .*? 例如.*? as few of any characters as possible 尽可能少的字符

1.) Using the U- modifier a pattern could look like: 1.)使用U- 修饰符 ,模式如下所示:

/\[\[(.*)]\s*\[(.*)]]/Us

Additional used the s (PCRE_DOTALL) modifier to make the . Additional使用s(PCRE_DOTALL) 修饰符制作. dot also match newlines. 点也匹配换行符。 And added some \\s whitespaces in between ][ which are in your sample string. 并在示例字符串中的][之间添加了一些\\s空格。 \\s is a shorthand for [ \\t\\r\\n\\f] . \\s[ \\t\\r\\n\\f]简写

There are two capturing groups (.*) to be replaced then. 然后有两个捕获组(.*)要替换。 Test on regex101.com 在regex101.com上测试


2.) Instead using the ? 2)代替使用? to making each quantifier lazy: 使每个量词变得懒惰:

/\[\[(.*?)]\s*\[(.*?)]]/s

Test on regex101.com 在regex101.com上测试


3.) Alternative without modifiers, if no square brackets are expected to be inside [...] . 3.)如果没有方括号,则建议不带修饰符的[...]

/\[\[([^]]*)]\s*\[([^]]*)]]/

Using a ^ negated character class to allow [^]]* any amount of characters, that are NOT ] in between [ and ] . 使用^否定字符类 ,以允许[^]]*任何字符量,未]在间[] This wouldn't require to rely on greediness. 这不需要依靠贪婪。 Also no . 也没有. is used, so no s-modifier is needed. 使用,因此不需要s-修饰符。

Test on regex101.com 在regex101.com上测试


Replacement for all 3 examples according to your sample: <a href="\\1">\\2</a> where \\1 correspond matches of the first parenthesized group ,... 根据您的示例替换所有3个示例: <a href="\\1">\\2</a>其中\\1对应第一个括号组的匹配项,...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM