简体   繁体   中英

Java - Char Buffer Issue

I have a character file of 1.99 GB. Now, I want to extract millions of sub-sequences from that file randomly for example from position 90 to 190, 10 to 110, 50000 to 50100 etc. (each of 100 characters long).

I usually do it using,

    FileChannel channel = new RandomAccessFile(file , "r").getChannel();
    ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
    Charset chars = Charset.forName("ISO-8859-1");
    CharBuffer cbuf = chars.decode(buffer);
    String sub = cbuf.subSequence(0, 100).toString();

    System.out.println(sub);

But, for 1.99 gb file above code gives error,

java.lang.IllegalArgumentException
        at java.nio.CharBuffer.allocate(CharBuffer.java:328)
        at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
        at java.nio.charset.Charset.decode(Charset.java:791)

So, I used following code,

FileChannel channel = new RandomAccessFile(file , "r").getChannel();
CharBuffer cbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size()).asCharBuffer() ;
String sub = cbuf.subSequence(0, 100).toString();

System.out.println(sub);

which does not gives above error but returns output:

ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹

Which should be "011111000000........"

Can anybody help me why this things happening and how to solve it ?

I'm just guessing, but I think Charset.decode(ByteBuffer) fails when it tries to allocate a huge CharBuffer for you behind the scenes. Again, this is just a hunch, but the decode method only decodes bytes from the buffer's current position up to its limit, so you can do something like this.

ByteBuffer buffer = ...
Charset charset = ...

buffer.position(0);
buffer.limit(100);

System.out.println(charset.decode(buffer));

The capacity (in characters) of the CharBuffer returned by the decode method will be 100.

(On a side note, I think your second attempt gives erroneous output, because you didn't use a particular character set to decode your CharBuffer .)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM