[英]How to match a block from start to end using regex
我想拾取從起始標題到結束標題的整個塊,但不包括結束標題。 例子是:
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
匹配結果應為:
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
問題是我如何在Java中使用Regex來為這場比賽制定模式?
如果您的整個輸入都采用這種格式,則可以簡單地拆分:
String[] sections = input.split("\\R(?=<)");
\\R
是“任何換行序列”,而(?=<)
表示“下一個字符是'<'
”。
但是,如果不是這種情況,則需要使用正則表達式工具箱:
DOTALL
標志,因此點也與換行符匹配 MULTILINE
標志,因此^
與行首匹配 假設“節”在行的開頭以“ <”開頭:
"(?sm)^<\\w+>(.(?!^<))*"
使用方法如下:
String input = "<section1>\nBase_Currency=EUR\nDescription=Revaluation\nGrouping_File\n<section2>\nfoo";
Matcher matcher = Pattern.compile("(?sm)^<\\w+>(.(?!^<))*").matcher(input);
while (matcher.find()) {
String section = matcher.group();
}
如果您的輸入如下所示
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section3>
Base_Currency=EUR
Description=Revaluation
Grouping_File
然后您可以使用以下正則表達式
(?s)(<section\d+>.*?)(?=<section\d+>|$)
正則表達式的解釋是
NODE EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
如果您只想匹配一個標簽,則可以使用
(?s)(<section\d+>[^<]*)
此正則表達式的解釋是
NODE EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[^<]* any character except: '<' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.