简体   繁体   中英

Parsing a large text file into chunks in Java

I would like to receive some suggestions regarding a little problem I am going to solve in Java.

I have a file consisting in this format:

@
some text
some text
some text

@
some text
some text
some text

@
some text
some text
some text

...and so on.

I would need to read the next chunk of this text file, then to create an InputStream object consting of the read chunk and to pass the InputStream object to a parser. I have to repeat these operations for every chunk in the text file. Each chunk is written between the lines starting with @. The problem is to parse each section between the @ tags using a parser which should read each chunk from an InputStream.

The text file could be big, so I would like to obtain good performance.

How could I solve this problem?

I have thought about doing something like this:

    FileReader fileReader = new FileReader(file);

    BufferedReader bufferedReader = new BufferedReader(fileReader);

    Scanner scanner = new Scanner(bufferedReader);

    scanner.useDelimiter("@");

    List<ParsedChunk> parsedChunks = new ArrayList<ParsedChunk>();

    ChunkParser parser = new ChunkParser();

    while(scanner.hasNext())
    {
        String text = scanner.next();

        InputStream inputStream = new ByteArrayInputStream(text.getBytes("UTF-8"));

        ParsedChunk parsedChunk = parser.parse(inputStream);

        parsedChunks.add(parsedChunk);

        inputStream.close();
    }

    scanner.close();

but I am not sure if it would be a good way to do it.

Thank you.

If I have understood correctly. This is what you are trying to achieve. FYI you will need JAVA 7 to get the below code running

public static void main(String[] args) throws IOException {
    List<String> allLines = Files.readAllLines(new File("d:/input.txt").toPath(), Charset.defaultCharset());
    List<List<String>> chunks = getChunks(allLines);
    //Now you have all te chunks and you can process them
}

private static List<List<String>> getChunks(List<String> allLines) {
    List<List<String>> result = new ArrayList<List<String>>();
    int i = 0;
    int fromIndex = 1;
    int toIndex = 0;
    for(String line : allLines){
        i++;
        if(line.startsWith("****") && i != 1){ // To skip the first line and the check next delimiter
            toIndex = i-1;          
            result.add(allLines.subList(fromIndex, toIndex));
            fromIndex = i;
        }
    }
    return result;
}

并没有完全解决问题,但是您现在可以尝试使用char作为,将所有字符存储在char数组中,并通过循环和条件语句,每次遇到'@'时都会中断字符串

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM