简体   繁体   English

正则表达式捕获多行组

[英]Regex capture multi-line groups

I'm struggling in creating a regex to capture what's included between two keywords in a multi-line file.我正在努力创建一个正则表达式来捕获多行文件中两个关键字之间包含的内容。

In particular, consider the following file:特别是,请考虑以下文件:

#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS

#%BODY
....
#%ENDS

#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS

#%BODY
....
#%ENDS

I wanted to parse what is included between the #%META and the #%ENDS keywords, if possible, without the leading # , ie, the desired result is to capture both:我想解析#%META#%ENDS关键字之间包含的内容,如果可能的话,不带前导# ,即期望的结果是同时捕获两者:

date: 2022-08-27
generated-by: Me
id: 1

and

date: 2022-08-27
generated-by: Another Me
id: 2

I come out with following regex: (?<=#%META\n)([\S\s]*?)(?=#%ENDS\n) .我提出了以下正则表达式: (?<=#%META\n)([\S\s]*?)(?=#%ENDS\n)

However this is not capable to identify the two chuncks of text to be matched as well as does not remove the leading # .但是,这不能识别要匹配的两个文本块,也不能删除前导#

Could anyone help in that?有人可以帮忙吗?

Thank's a lot: :)非常感谢: :)

You might use a pattern to first capture all the parts between #%META and #%ENDS and then after process the capture group 1 values removing the leading # followed by optional spaces.您可以使用一种模式首先捕获#%META#%ENDS之间的所有部分,然后在处理捕获组 1 值后删除前导#后跟可选空格。

^#%META((?>\R(?!#%(?:META|ENDS)$).*)+)\R#%ENDS$

Explanation解释

  • ^ Start of string ^字符串开头
  • #%META Match literally #%META字面匹配
  • ( Capture group 1 (捕获组 1
    • (?> Atomic group (?>原子组
      • \R Match any unicode newline sequence \R匹配任何 unicode 换行序列
      • (??#%(::META|ENDS)$) Negative lookahead, assert that the line is not #%META or #%ENDS (??#%(::META|ENDS)$)负前瞻,断言该行不是#%META#%ENDS
      • .* Match the whole line .*匹配整行
    • )+ Close the atomic group and repeat 1+ times )+关闭原子组并重复 1+ 次
  • ) Close group 1 )关闭第 1 组
  • \R Match any unicode newline sequence \R匹配任何 unicode 换行序列
  • #%ENDS Match literally #%ENDS字面匹配
  • $ End of string $字符串结尾

Regex demo |正则表达式演示| PHP demo PHP 演示

Example例子

$re = '/^#%META((?>\R(?!#%(?:META|ENDS)$).*)+)\R#%ENDS$/m';
$str = '#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS

#%BODY
....
#%ENDS

#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS

#%BODY
....
#%ENDS';

if (preg_match_all($re, $str, $matches)) {
    $result = array_map(function ($s) {
        return preg_replace("/^#\h*/m", "", trim($s));
    }, $matches[1]);
    var_export($result);
}

Output Output

array (
  0 => 'date: 2022-08-27
generated-by: Me
id: 1',
  1 => 'date: 2022-08-27
generated-by: Another Me
id: 2',
)

You forgot to add /m modifier to regex to find all matches您忘记将 /m 修饰符添加到正则表达式以查找所有匹配项
Try this:尝试这个:

    $str = preg_replace_callback(
        '/# (.+)\S/m',
        static function ($m) {
            return $m[1];
        },
        $str,
    ); // or just str_replace('# ', '', $str)
    preg_match('/((?<=#%META\n)([\S\s]*?)(?=#%ENDS\n))/m' ,$str, $m);
    var_dump($m);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM