如何使用正則表達式從頭到尾匹配一個塊

Question

我想拾取從起始標題到結束標題的整個塊，但不包括結束標題。 例子是：

<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>

匹配結果應為：

<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File

問題是我如何在Java中使用Regex來為這場比賽制定模式？

Answer 1

如果您的整個輸入都采用這種格式，則可以簡單地拆分：

String[] sections = input.split("\\R(?=<)");

\\R是“任何換行序列”，而(?=<)表示“下一個字符是'<' ”。

但是，如果不是這種情況，則需要使用正則表達式工具箱：

DOTALL標志，因此點也與換行符匹配
MULTILINE標志，因此^與行首匹配
負面的展望，所以您在下一節的開始就不再消費

假設“節”在行的開頭以“ <”開頭：

"(?sm)^<\\w+>(.(?!^<))*"

使用方法如下：

String input = "<section1>\nBase_Currency=EUR\nDescription=Revaluation\nGrouping_File\n<section2>\nfoo";
Matcher matcher = Pattern.compile("(?sm)^<\\w+>(.(?!^<))*").matcher(input);
while (matcher.find()) {
    String section = matcher.group();
}

Answer 2

如果您的輸入如下所示

<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section3>
Base_Currency=EUR
Description=Revaluation
Grouping_File

然后您可以使用以下正則表達式

(?s)(<section\d+>.*?)(?=<section\d+>|$)

正則表達式的解釋是

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           \n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    <section                 '<section'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    <section                 '<section'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

如果您只想匹配一個標簽，則可以使用

(?s)(<section\d+>[^<]*)

此正則表達式的解釋是

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           \n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    <section                 '<section'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
    [^<]*                    any character except: '<' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1

如何使用正則表達式從頭到尾匹配一個塊

問題描述

2 個解決方案

解決方案1
2 2017-02-21 15:27:23

解決方案2
1 已采納 2017-02-21 15:34:56

如何使用正則表達式從頭到尾匹配一個塊

問題描述

2 個解決方案

解決方案1 2 2017-02-21 15:27:23

解決方案2 1 已采納 2017-02-21 15:34:56

解決方案1
2 2017-02-21 15:27:23

解決方案2
1 已采納 2017-02-21 15:34:56