正则表达式以匹配字符串的第一个匹配项与最后一个匹配的字符串

Question

I have the following list 我有以下清单

Acid
stuff
goo
nasty
Probable
Acid
more stuff
Probable
Acid 
fff
ggg
Probable

I want to match everything between Acid and Probable. 我想匹配“酸”和“可能”之间的所有内容。 However my regex matches only the last match ( Acid,fff,ggg,Probable ) not the first ( Acid,stuff, goo, nasty, Probable ) 但是我的正则表达式仅匹配最后一个匹配项（ Acid,fff,ggg,Probable ）而不匹配第一个匹配项（ Acid,stuff, goo, nasty, Probable ）

The calling class: 调用类：

    public static void main(String[] args) throws IOException {


       PDFManager pdfManager = new PDFManager();
       pdfManager.setFilePath("MyFile.pdf");
       String s=pdfManager.ToText();


       if(s.contains("Thresholds")){

              BravoaltDoc_ExtractionNonDays Sum = new BravoaltDoc_ExtractionNonDays(s);
              Sum.ExtractSumNew(s);


   public class BravoaltDoc_ExtractionNonDays {
    String doc;
}}

    ArrayList<String> Day_arr = new ArrayList<String>();
    ArrayList<List<String>> Day_table2d = new ArrayList<List<String>>();
    String [] seTab3Landmarks=null;

    public BravoaltDoc_ExtractionNonDays(String doc) {
        this.doc=doc;
    }

    public String ExtractSumNew(String doc) {
        Pattern Tab3Landmarks_pattern = Pattern.compile("Acid?(.*?)Probable",Pattern.DOTALL);
        Matcher matcherTab3Landmarks_pattern = Tab3Landmarks_pattern.matcher(doc);
        while (matcherTab3Landmarks_pattern.find()) {
            doc=matcherTab3Landmarks_pattern.group(1);
            seTab3Landmarks=matcherTab3Landmarks_pattern.group(1).split("\\n|\\r");
        }
        for (String n:seTab3Landmarks){
            System.out.println(n);
        }
return docSlim;

    }

}

Answer 1

Description 描述

This regex will do the following: 此正则表达式将执行以下操作：

Match the sub strings starting with Acid to Probable 将以Acid开头的子字符串匹配为Probable
Requires Acid and Probable to be on their own line. 需要Acid和Probable在自己的行。 If they are embedded in the middle of a string like gooProbablegoo these won't match 如果它们嵌入在像gooProbablegoo这样的字符串中间， gooProbablegoo它们将不匹配

For this regex I used the Case Insenstive flag, and Dot matches new line Flag. 对于此正则表达式，我使用了Case Insenstive标志，而Dot匹配了新行Flag。

(?:\r|\n|\A)\s*Acid\s*?[\r\n].*?[\r\n]\s*Probable\s*?(?:\r|\n|\Z)

正则表达式可视化

Example 例

Sample Text 示范文本

Note: the difficult edge case in the third line. 注意：第三行中的困难边缘情况。

Acid
stuff
gooProbablegoo
nasty
Probable
Acid
more stuff
Probable
Acid
fff
ggg
Probable

Matches 火柴

[0][0] = Acid
stuff
gooProbablegoo
nasty
Probable

[1][0] = 
Acid
more stuff
Probable

[2][0] = 
Acid
fff
ggg
Probable

Explained 讲解

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \r                       '\r' (carriage return)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \n                       '\n' (newline)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \A                       the beginning of the string
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  Acid                     'Acid'
----------------------------------------------------------------------
  \s*?                     whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the least amount
                           possible))
----------------------------------------------------------------------
  [\r\n]                   any character of: '\r' (carriage return),
                           '\n' (newline)
----------------------------------------------------------------------
  .*?                      any character (0 or more times (matching
                           the least amount possible))
----------------------------------------------------------------------
  [\r\n]                   any character of: '\r' (carriage return),
                           '\n' (newline)
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  Probable                 'Probable'
----------------------------------------------------------------------
  \s*?                     whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the least amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \r                       '\r' (carriage return)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \n                       '\n' (newline)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \Z                       before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
  )                        end of grouping

Answer 2

Your code correctly finds all the matches. 您的代码正确找到所有匹配项。 However, since each find re-assigns seTab3Landmarks , you only get the last match printed out at the end. 但是，由于每个查找都重新分配了seTab3Landmarks ，因此您只会在末尾打印出最后一个匹配项。

if you only want the first match, you should use an "if" block instead of a "while" block (which finds all matches). 如果只希望第一个匹配，则应使用“ if”块而不是“ while”块（可找到所有匹配项）。

正则表达式以匹配字符串的第一个匹配项与最后一个匹配的字符串

问题描述

2 个解决方案

解决方案1
2 2016-05-06 01:22:47

Description 描述

Example 例

Explained 讲解

解决方案2
1 已采纳 2016-05-06 01:42:31

正则表达式以匹配字符串的第一个匹配项与最后一个匹配的字符串

问题描述

2 个解决方案

解决方案1 2 2016-05-06 01:22:47

Description 描述

Example 例

Explained 讲解

解决方案2 1 已采纳 2016-05-06 01:42:31

解决方案1
2 2016-05-06 01:22:47

解决方案2
1 已采纳 2016-05-06 01:42:31