[英]Matching 3 strings with regex
我有以下文本: Invoice n.ro per 006390 BENETTON RUSSIA OOO 2019 0051035408
我需要檢查文本是否包含Invoice
和2019
(4 位數字) ,在這 4 位數字之后還有n
位數字,所以我想讀取Invoice
名稱並跳過第一行,然后獲取第二行元素,如下所示:
File file = new File(this.fileName); // creating file object with String path
final Pattern invoice = Pattern.compile("^Invoice n ([0-9])+$"); // using reg expression to match what we looking for
PDDocument pdDocument = PDDocument.load(file); // creating PDD object and loading file that already got path
Splitter splitter = new Splitter(); // splitter that takes care of splitting pages
PDFTextStripper stripper = new PDFTextStripper(); // stripper strips text and ignore all formatting
Matcher matcher;
String resultInvoiceNumber = "";
List<PDDocument> split = splitter.split(pdDocument); // split method splits into pages;
for (PDDocument pd : split) { // looping through the list of split pages
String s = stripper.getText(pd); // getting text from single page and assign it to a String for further manipulation
您可以根據組嘗試類似的操作:
public class RegexpTest {
public static void main(String[] args) {
final String input = "Invoice n.ro per 006390 BENETTON RUSSIA OOO 2019 0051035408";
final Pattern pattern = Pattern.compile("(Invoice)*(\\s*\\d{4}\\s+\\d+\\s*)");
final Matcher matcher = pattern.matcher(input);
System.out.println(matcher.find());
System.out.println(matcher.group());
}
}
輸出:
true
2019 0051035408
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.