简体   繁体   中英

How to search for multiline-parallel text in Java?

Consider a table with the following headers in a text file

    Table name goes here
                                                                     Page 1
    This is column one                 This is   This
                         This is       column    is column
                         column two f   thre f    three f
                                                 and hal f

     Row1 in column 1    Row2InCol2     Row3       Row4InCol4


                                                                     Page 2


 This is column one                   This is     This
                        This is       column    is column
                        column two f   thre f    three f
                                                and hal f


 Grand Total: -       12               13        25     

I want to search for the column "This is column three f and a hal f" in such a way that when I find this text, I am able to get the String index postion where this column started (Index of "This") and the index postion where this column ended (Index at which the word "hal f" ended, that is Index of 'f'). Note that all the columns contain the word "This" and the letter 'f' and that I should be able to search the start index and end index for any of the columns in similar fashion as explained above.

I want be able to do this because I want to implement a parser that can parse tables in a text file in which the index postion of the column headers and column data is not consistent from one page to another (where form feed character indicates end of page)

I am not looking for any algorithm as such. I want to know whether Pattern and Matcher classes (or any other APIs) support multi-line text searches as explained above?

A simple pattern which has worked for me in the past.

// split on two ore more spaces.
String[] fields = line.split("\\s{2,}");

This treats one space as being part of a field.

Because the text you're searching for is a fixed literal, regex is not the weapon of choice - just use String.indexOf(String) on the whole text, including newlines, from the first "This" to the last "f" :

String target = "This\nThis is       column    is column\n                        column two f   thre f    three f\n                                                and hal f";

int start = input.indexOf(target);
int end = start + target.length();

To find the next occurrence, use String.indexOf(String str, int fromIndex) using the previous end as fromIndex

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM