简体   繁体   English

用正则表达式获取标记内的引号

[英]get quotation marks inside tag with regex

Hy there. 在那里。 I'm trying to get all quotation marks inside a specific start- end-string. 我试图在特定的起始字符串中获取所有引号。 Let's say I have this string: 假设我有这个字符串:

`Hello "world". [start]this is a "mark"[end]. It should work with [start]"several" "marks"[end]`

Now I want every " inside the [start] .. [end] to be replaced by " : 现在,我想每一个“里面的[开始]。[结束]被替换为"

$string = 'Hello "world". [start]this is a "mark"[end]. It should work with [start]"several" "marks"[end]';
$regex = '/(?<=\[start])(.*?)(?=\[end])/';
$replace = '&quot;';

$string = preg_replace($regex,$replace,$string);

This matches the text between [start] and [end]. 这与[start]和[end]之间的文本匹配。 But I want to match the " inside it: 但我想匹配“里面的:

//expected: Hello "world". [start]this is a &quot;mark&quot;[end]. It should work with [start]&quot;several&quot; &quot;marks&quot;[end]

Any Ideas? 有任何想法吗?

(?s)"(?=((?!\[start\]).)*\[end\])

Live demo 现场演示

Explanation: 说明:

 (?s)                       DOT_ALL modifier
 "                          Literal "
 (?=                        Begin lookahead
      (                         # (1 start)
           (?! \[start\] )          Current position should not be followed by [start]
           .                        If yes then match
      )*                        # (1 end)
      \[end\]                   Until reaching [end]
 )                          End lookahead

PHP live demo PHP现场演示

An approach with a preg_replace_callback allows to use a simpler regex (considering your string always has paired non-nested [start]...[end] pairs): 使用preg_replace_callback的方法允许使用更简单的正则表达式(考虑到您的字符串始终具有成对的非嵌套[start]...[end]对):

$string = 'Hello "world". [start]this is a "mark"[end]. It should work with [start]"several" "marks"[end]';
$regex = '/\[start].*?\[end]/s';
$string = preg_replace_callback($regex, function($m) {
    return str_replace('"', '&quot;', $m[0]);
},$string);
echo $string;
// => Hello "world". [start]this is a &quot;mark&quot;[end]. It should work with [start]&quot;several&quot; &quot;marks&quot;[end]

See the PHP IDEONE demo 请参阅PHP IDEONE演示

The '/\\[start].*?\\[end]/s' regex matches [start] , then any 0+ chars (incl. a newline since the /s DOTALL modifier is used, and then [end] follows. '/\\[start].*?\\[end]/s'正则表达式匹配[start] ,然后匹配任何0+字符(包括使用/s DOTALL修饰符后的换行符,然后是[end]

If you need to ensure the shortest window between the first [start] and [end] , you will need to use a regex with a tempered greedy token as in Revo's answer: '/\\[start](?:(?!\\[(?:start|end)]).)*\\[end]/s' (see PHP demo and a regex demo ). 如果你需要确保第一个[start][end]之间的最短窗口,你需要使用一个带有驯化贪婪令牌的正则表达式,如Revo的答案: '/\\[start](?:(?!\\[(?:start|end)]).)*\\[end]/s' (参见PHP演示正则表达式演示 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM