简体   繁体   中英

Matching 3 strings with regex

I have the following text: Invoice n.ro per 006390 BENETTON RUSSIA OOO 2019 0051035408

I need to check if the text contains Invoice and 2019 (4 digits) and after those 4 digits there are another n digits, so I was thinking to read Invoice name and skip the first line then get the second line elements like this:


    File file = new File(this.fileName); // creating file object with String path
        final Pattern invoice = Pattern.compile("^Invoice n ([0-9])+$"); // using reg expression to match what we looking for

            PDDocument pdDocument = PDDocument.load(file); // creating PDD object and loading file that already got path
            Splitter splitter = new Splitter(); // splitter that takes care of splitting pages
            PDFTextStripper stripper = new PDFTextStripper(); // stripper strips text and ignore all formatting
            Matcher matcher;
            String resultInvoiceNumber = "";

            List<PDDocument> split = splitter.split(pdDocument); // split method splits into pages;

            for (PDDocument pd : split) { // looping through the list of split pages
                String s = stripper.getText(pd); //  getting text from single page  and assign it to a String for further manipulation

The question was edited, but for the original string with the numbers on a newline, you could match n. and then till the end of the line. Then match a unicode newline sequence using \\R , match 1+ horizontal newline characters and match the numbers.

The numbers at the end of the second line are in capturing group 1.

^Invoice n\..*\R\h+[0-9]{4} ([0-9]+)$

Regex demo | Java demo

In Java

String regex = "^Invoice n\\..*\\R\\h+[0-9]{4} ([0-9]+)$";

You can try something like this based on groups:

public class RegexpTest {

    public static void main(String[] args) {
        final String input = "Invoice n.ro per 006390 BENETTON RUSSIA OOO 2019 0051035408";
        final Pattern pattern = Pattern.compile("(Invoice)*(\\s*\\d{4}\\s+\\d+\\s*)");

        final Matcher matcher = pattern.matcher(input);
        System.out.println(matcher.find());
        System.out.println(matcher.group());
    }
}

Output:

true
 2019 0051035408

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM