简体   繁体   中英

Regular expression to find pattern in string in PHP

Suppose I have a string that looks like:

"lets refer to [[merp] [that entry called merp]] and maybe also to that entry called [[blue] [blue]]"

The idea here is to replace a block of [[name][some text]] with <a href="name.html">some text</a> .

So I'm trying to use regular expressions to find blocks that look like [[name][some text]] , but I'm having tremendous difficulty.

Here's what I thought should work (in PHP): preg_match_all('/\\[\\[.*\\]\\[.*\\]/', $my_big_string, $matches)

But this just returns a single match, the string from '[[merp' to 'blue]]' . How can I get it to return the two matches [[merp][that entry called merp]] and [[blue][blue]] ?

The regex you're looking for is \\[\\[(.+?)\\]\\s\\[(.+?)\\]\\] and replace it with <a href="$1">$2</a>

The regex pattern matched inside the () braces are captured and can be back-referenced using $1, $2,...

Example on regex101.com

Quantifiers like the * are by default greedy ,

which means, that as much as possible is matched to meet conditions. Eg in your sample a regex like \\[.*\\] would match everything from the first [ to the last ] in the string. To change the default behaviour and make quantifiers lazy ( ungreedy, reluctant ):

  • Use the U (PCRE_UNGREEDY) modifier to make all quantifiers lazy
  • Put a ? after a specific quantifier. Eg .*? as few of any characters as possible

1.) Using the U- modifier a pattern could look like:

/\[\[(.*)]\s*\[(.*)]]/Us

Additional used the s (PCRE_DOTALL) modifier to make the . dot also match newlines. And added some \\s whitespaces in between ][ which are in your sample string. \\s is a shorthand for [ \\t\\r\\n\\f] .

There are two capturing groups (.*) to be replaced then. Test on regex101.com


2.) Instead using the ? to making each quantifier lazy:

/\[\[(.*?)]\s*\[(.*?)]]/s

Test on regex101.com


3.) Alternative without modifiers, if no square brackets are expected to be inside [...] .

/\[\[([^]]*)]\s*\[([^]]*)]]/

Using a ^ negated character class to allow [^]]* any amount of characters, that are NOT ] in between [ and ] . This wouldn't require to rely on greediness. Also no . is used, so no s-modifier is needed.

Test on regex101.com


Replacement for all 3 examples according to your sample: <a href="\\1">\\2</a> where \\1 correspond matches of the first parenthesized group ,...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM