简体   繁体   中英

get quotation marks inside tag with regex

Hy there. I'm trying to get all quotation marks inside a specific start- end-string. Let's say I have this string:

`Hello "world". [start]this is a "mark"[end]. It should work with [start]"several" "marks"[end]`

Now I want every " inside the [start] .. [end] to be replaced by " :

$string = 'Hello "world". [start]this is a "mark"[end]. It should work with [start]"several" "marks"[end]';
$regex = '/(?<=\[start])(.*?)(?=\[end])/';
$replace = '&quot;';

$string = preg_replace($regex,$replace,$string);

This matches the text between [start] and [end]. But I want to match the " inside it:

//expected: Hello "world". [start]this is a &quot;mark&quot;[end]. It should work with [start]&quot;several&quot; &quot;marks&quot;[end]

Any Ideas?

(?s)"(?=((?!\[start\]).)*\[end\])

Live demo

Explanation:

 (?s)                       DOT_ALL modifier
 "                          Literal "
 (?=                        Begin lookahead
      (                         # (1 start)
           (?! \[start\] )          Current position should not be followed by [start]
           .                        If yes then match
      )*                        # (1 end)
      \[end\]                   Until reaching [end]
 )                          End lookahead

PHP live demo

An approach with a preg_replace_callback allows to use a simpler regex (considering your string always has paired non-nested [start]...[end] pairs):

$string = 'Hello "world". [start]this is a "mark"[end]. It should work with [start]"several" "marks"[end]';
$regex = '/\[start].*?\[end]/s';
$string = preg_replace_callback($regex, function($m) {
    return str_replace('"', '&quot;', $m[0]);
},$string);
echo $string;
// => Hello "world". [start]this is a &quot;mark&quot;[end]. It should work with [start]&quot;several&quot; &quot;marks&quot;[end]

See the PHP IDEONE demo

The '/\\[start].*?\\[end]/s' regex matches [start] , then any 0+ chars (incl. a newline since the /s DOTALL modifier is used, and then [end] follows.

If you need to ensure the shortest window between the first [start] and [end] , you will need to use a regex with a tempered greedy token as in Revo's answer: '/\\[start](?:(?!\\[(?:start|end)]).)*\\[end]/s' (see PHP demo and a regex demo ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM