简体   繁体   English

如何在Java中搜索多行并行文本?

[英]How to search for multiline-parallel text in Java?

Consider a table with the following headers in a text file 考虑一个文本文件中带有以下标题的表

    Table name goes here
                                                                     Page 1
    This is column one                 This is   This
                         This is       column    is column
                         column two f   thre f    three f
                                                 and hal f

     Row1 in column 1    Row2InCol2     Row3       Row4InCol4


                                                                     Page 2


 This is column one                   This is     This
                        This is       column    is column
                        column two f   thre f    three f
                                                and hal f


 Grand Total: -       12               13        25     

I want to search for the column "This is column three f and a hal f" in such a way that when I find this text, I am able to get the String index postion where this column started (Index of "This") and the index postion where this column ended (Index at which the word "hal f" ended, that is Index of 'f'). 我想搜索“这是三列f和hal f列”列,这样一来,当我找到此文本时,就能够获得此列开始的String索引位置(“ This”的索引)和此列结束的索引位置(单词“ hal f”结束的索引,即“ f”的索引)。 Note that all the columns contain the word "This" and the letter 'f' and that I should be able to search the start index and end index for any of the columns in similar fashion as explained above. 请注意,所有列均包含单词“ This”和字母“ f”,并且我应该能够以与上述类似的方式搜索任何列的开始索引和结束索引。

I want be able to do this because I want to implement a parser that can parse tables in a text file in which the index postion of the column headers and column data is not consistent from one page to another (where form feed character indicates end of page) 我希望能够执行此操作,因为我想实现一个解析器,该解析器可以解析文本文件中的表,在该文本文件中,列标题和列数据的索引位置从一页到另一页不一致(其中,换页符表示末尾)页)

I am not looking for any algorithm as such. 我不是在寻找任何这样的算法。 I want to know whether Pattern and Matcher classes (or any other APIs) support multi-line text searches as explained above? 我想知道Pattern和Matcher类(或任何其他API)是否支持如上所述的多行文本搜索吗?

A simple pattern which has worked for me in the past. 过去对我有用的简单模式。

// split on two ore more spaces.
String[] fields = line.split("\\s{2,}");

This treats one space as being part of a field. 这会将一个空间视为字段的一部分。

Because the text you're searching for is a fixed literal, regex is not the weapon of choice - just use String.indexOf(String) on the whole text, including newlines, from the first "This" to the last "f" : 因为您要搜索的文本是固定的文字,所以正则表达式不是首选的武器-只需在整个文本(包括换行符String.indexOf(String)使用String.indexOf(String) ,从第一个"This"到最后一个"f"

String target = "This\nThis is       column    is column\n                        column two f   thre f    three f\n                                                and hal f";

int start = input.indexOf(target);
int end = start + target.length();

To find the next occurrence, use String.indexOf(String str, int fromIndex) using the previous end as fromIndex 若要查找下一个匹配项,请使用String.indexOf(String str, int fromIndex) ,将上一个end作为fromIndex

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM