简体   繁体   中英

java, ByteBuffer to parse data from file

In java, I want to parse a file, with heterogenous data (numbers and characters), fast.

I've been reading about ByteBuffer and memory mapped files.

I can copy it, but when parsing data it becomes tricky. I'd like to do it allocating various bytes. But it become then dependent on the encoding?

If the format of the file is, for instance:

someString 8
some other string 88

How can I parse it to String or Integer objects?

Thanks!

Udo.

Assuming your format is something like

{string possibly with spaces} {integer}\r?\n

You need to search for the newline, and work backward until you find the first space. You can decode the number yourself and turn it into an int or turn it into a String and parse it. I wouldn't use an Integer unless you had to. Now you know where the start of the line is and the start of the integer you can extract the String as bytes and convert it into a String using your desired encoding.

This assumes that newline and space are one byte in your encoding. It would be more complicated if they are multi-byte byte it can still be done.

EDIT: The following example prints...

text: ' someString', number: 8
text: 'some other string', number: -88

Code

ByteBuffer bb = ByteBuffer.wrap(" someString 8\r\nsome other string -88\n".getBytes());
while(bb.remaining()>0) {
    int start = bb.position(),end, ptr;
    for(end = start;end < bb.limit();end++) {
        byte b = bb.get(end);
        if (b == '\r' || b == '\n')
            break;
    }
    // read the number backwards
    long value = 0;
    long tens = 1;
    for(ptr = end-1;ptr>= start;ptr--) {
        byte b = bb.get(ptr);
        if (b >= '0' && b <= '9') {
            value += tens * (b - '0');
            tens *= 10;
        } else if (b == '-') {
            value = -value;
            ptr--;
            break;
        } else {
            break;
        }
    }
    // assume separator is a space....
    byte[] bytes = new byte[ptr-start];
    bb.get(bytes);
    String text = new String(bytes, "UTF-8");
    System.out.println("text: '"+text+"', number: "+value);

    // find the end of the line.
    if (bb.get(end) == '\r') end++;
    bb.position(end+1);
}

You can try it this way:

CharacterIterator it = new StringCharacterIterator(StringBuffer.toString());
for (char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
    if (Character.isDigit(c)) {
        // character is digit
    } else {
        // character is not-digit
    }
}

Or you can use regex if you prefer

String str = StringBuffer.toString();
String numbers = str.replaceAll("\\D", "");
String letters = str.replaceAll("\\W", "");

Then you need to perform Integer.parseInt() as usual on the characters in your string numbers .

Are you looking for java.util.Scanner ? Unless you have really exotic performance requirements, that should be fast enough:

    Scanner s = new Scanner(new File("C:\\test.txt"));
    while (s.hasNext()) {
        String label = s.next();
        int number = s.nextInt();

        System.out.println(number + " " + label);
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM