簡體   English   中英

如何使用正則表達式從頭到尾匹配一個塊

[英]How to match a block from start to end using regex

我想拾取從起始標題到結束標題的整個塊,但不包括結束標題。 例子是:

<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>

匹配結果應為:

<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File

問題是我如何在Java中使用Regex來為這場比賽制定模式?

如果您的整個輸入都采用這種格式,則可以簡單地拆分:

String[] sections = input.split("\\R(?=<)");

\\R是“任何換行序列”,而(?=<)表示“下一個字符是'<' ”。

但是,如果不是這種情況,則需要使用正則表達式工具箱:

  • DOTALL標志,因此點也與換行符匹配
  • MULTILINE標志,因此^與行首匹配
  • 負面的展望,所以您在下一節的開始就不再消費

假設“節”在行的開頭以“ <”開頭:

"(?sm)^<\\w+>(.(?!^<))*"

使用方法如下:

String input = "<section1>\nBase_Currency=EUR\nDescription=Revaluation\nGrouping_File\n<section2>\nfoo";
Matcher matcher = Pattern.compile("(?sm)^<\\w+>(.(?!^<))*").matcher(input);
while (matcher.find()) {
    String section = matcher.group();
}

如果您的輸入如下所示

<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section3>
Base_Currency=EUR
Description=Revaluation
Grouping_File

然后您可以使用以下正則表達式

(?s)(<section\d+>.*?)(?=<section\d+>|$)

正則表達式的解釋是

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           \n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    <section                 '<section'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    <section                 '<section'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

如果您只想匹配一個標簽,則可以使用

(?s)(<section\d+>[^<]*)

此正則表達式的解釋是

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           \n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    <section                 '<section'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
    [^<]*                    any character except: '<' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM