简体   繁体   中英

What's an elegant way to parse this text in java?

Disclaimer :
The parsing-problem described in here is very simple. This question does not simply ask for a way to achieve the parsing. - That's almost straightforward - Instead, it asks for an elegant way. That elegant way would probably be one which does not first read line-wise and then parse each line on its own, as this is obviously not necessary. However, is this elegant way possible with ready to use standard classes?

Question:
I have to parse text of the following form in java (there is more than these 3 records; records can have way more lines than these examples):

5
Dominik 3 
Markus 3 2
Reiner 1 2
Samantha 4 
Thomas 3
4
Babette 1 4 
Diana 3 4 
Magan 2 
Thomas 2 4 

The first number n is the number of lines in the record directly following. Each record consists of a name and then 0 to n integers.

I thought that using java.util.Scanner is a natural choice, but it leads to the nastiness that when using hasNextInt() and hasNext() to determine if a line is started, I can't distinguish if a read number is the header of the next record or it's the last number behind the last name of the previous record. Example from above:

...
Thomas 3
4
...

Here, I don't know how to tell if the 3 and the 4 is a header or belongs to the current line of Thomas .

Sure I can first read line by line, put them into another Scanner , and then read them again, but this effectively parses the whole data twice, which looks ugly to me. Is there a better way?

I would need something like a flag which tells me if a line break was encountered during the last delimiter skipping operation.

Instead of reading into a separate scanner, you can read to end of line, and use String.split , like this:

while (scanner.hasNextInt()) {
    int count = scanner.nextInt();
    for (int i = 0 ; i != count ; i++) {
        if (!scanner.hasNext()) throw new IllegalStateException("expected a name");
        String name = scanner.next();
        List<Integer> numbers = new ArrayList<Integer>();
        for (String numStr : scanner.readLine().split(" ")) {
            numbers.add(Integer.parseInt(numStr));
        }
        ... // Do something with name and numbers
    }
}

This approach avoids the need to detect the difference between the last int on a line vs. the first integer on next line by calling readLine() after reading a name, ie in the middle of reading a line.

Read the file using FileReader and BufferedReader and then start checking :

outer loop -->while readLine is not null 
if line matches //d+ --> read value of number and put it into count
from 0 to count do what you want to do  // inner loop
File file = new File("records.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));

String line = null;
   /* Read file one line at a time */
   while((line = reader.readLine()) != null){
       int noOfRecords = Integer.parseInt(line);
       /* read the next n lines in a loop */
       while(noOfRecords != 0){
           line = reader.readLine();
           String[] tokens = line.split(" ");
           noOfRecords--;
           // do what you need to do with names and numbers
       }
   }

Here we're reading one line at a time, so the first time we read a line it will be an int (call it as n), from there read the next n lines in some inner loop. Once it's done with this inner loop it will come outside and the next time you read a line it's definitely another int or EOF. That way you don't have to deal with integer parsing exceptions and we'll read all the lines only once :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM