[英]Regex to match first occurrence of a string is matching the last
I have the following list 我有以下清单
Acid
stuff
goo
nasty
Probable
Acid
more stuff
Probable
Acid
fff
ggg
Probable
I want to match everything between Acid and Probable. 我想匹配“酸”和“可能”之间的所有内容。 However my regex matches only the last match (
Acid,fff,ggg,Probable
) not the first ( Acid,stuff, goo, nasty, Probable
) 但是我的正则表达式仅匹配最后一个匹配项(
Acid,fff,ggg,Probable
)而不匹配第一个匹配项( Acid,stuff, goo, nasty, Probable
)
The calling class: 调用类:
public static void main(String[] args) throws IOException {
PDFManager pdfManager = new PDFManager();
pdfManager.setFilePath("MyFile.pdf");
String s=pdfManager.ToText();
if(s.contains("Thresholds")){
BravoaltDoc_ExtractionNonDays Sum = new BravoaltDoc_ExtractionNonDays(s);
Sum.ExtractSumNew(s);
public class BravoaltDoc_ExtractionNonDays {
String doc;
}}
ArrayList<String> Day_arr = new ArrayList<String>();
ArrayList<List<String>> Day_table2d = new ArrayList<List<String>>();
String [] seTab3Landmarks=null;
public BravoaltDoc_ExtractionNonDays(String doc) {
this.doc=doc;
}
public String ExtractSumNew(String doc) {
Pattern Tab3Landmarks_pattern = Pattern.compile("Acid?(.*?)Probable",Pattern.DOTALL);
Matcher matcherTab3Landmarks_pattern = Tab3Landmarks_pattern.matcher(doc);
while (matcherTab3Landmarks_pattern.find()) {
doc=matcherTab3Landmarks_pattern.group(1);
seTab3Landmarks=matcherTab3Landmarks_pattern.group(1).split("\\n|\\r");
}
for (String n:seTab3Landmarks){
System.out.println(n);
}
return docSlim;
}
}
This regex will do the following: 此正则表达式将执行以下操作:
Acid
to Probable
Acid
开头的子字符串匹配为Probable
Acid
and Probable
to be on their own line. Acid
和Probable
在自己的行。 If they are embedded in the middle of a string like gooProbablegoo
these won't match gooProbablegoo
这样的字符串中间, gooProbablegoo
它们将不匹配 For this regex I used the Case Insenstive flag, and Dot matches new line Flag. 对于此正则表达式,我使用了Case Insenstive标志,而Dot匹配了新行Flag。
(?:\r|\n|\A)\s*Acid\s*?[\r\n].*?[\r\n]\s*Probable\s*?(?:\r|\n|\Z)
Sample Text 示范文本
Note: the difficult edge case in the third line. 注意:第三行中的困难边缘情况。
Acid
stuff
gooProbablegoo
nasty
Probable
Acid
more stuff
Probable
Acid
fff
ggg
Probable
Matches 火柴
[0][0] = Acid
stuff
gooProbablegoo
nasty
Probable
[1][0] =
Acid
more stuff
Probable
[2][0] =
Acid
fff
ggg
Probable
NODE EXPLANATION
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\r '\r' (carriage return)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\n '\n' (newline)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\A the beginning of the string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
Acid 'Acid'
----------------------------------------------------------------------
\s*? whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the least amount
possible))
----------------------------------------------------------------------
[\r\n] any character of: '\r' (carriage return),
'\n' (newline)
----------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
----------------------------------------------------------------------
[\r\n] any character of: '\r' (carriage return),
'\n' (newline)
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
Probable 'Probable'
----------------------------------------------------------------------
\s*? whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the least amount
possible))
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\r '\r' (carriage return)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\n '\n' (newline)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\Z before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of grouping
Your code correctly finds all the matches. 您的代码正确找到所有匹配项。 However, since each find re-assigns
seTab3Landmarks
, you only get the last match printed out at the end. 但是,由于每个查找都重新分配了
seTab3Landmarks
,因此您只会在末尾打印出最后一个匹配项。
if you only want the first match, you should use an "if" block instead of a "while" block (which finds all matches). 如果只希望第一个匹配,则应使用“ if”块而不是“ while”块(可找到所有匹配项)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.