I have a character file of 1.99 GB. Now, I want to extract millions of sub-sequences from that file randomly for example from position 90 to 190, 10 to 110, 50000 to 50100 etc. (each of 100 characters long).
I usually do it using,
FileChannel channel = new RandomAccessFile(file , "r").getChannel();
ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
Charset chars = Charset.forName("ISO-8859-1");
CharBuffer cbuf = chars.decode(buffer);
String sub = cbuf.subSequence(0, 100).toString();
System.out.println(sub);
But, for 1.99 gb file above code gives error,
java.lang.IllegalArgumentException
at java.nio.CharBuffer.allocate(CharBuffer.java:328)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
at java.nio.charset.Charset.decode(Charset.java:791)
So, I used following code,
FileChannel channel = new RandomAccessFile(file , "r").getChannel();
CharBuffer cbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size()).asCharBuffer() ;
String sub = cbuf.subSequence(0, 100).toString();
System.out.println(sub);
which does not gives above error but returns output:
ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹
Which should be "011111000000........"
Can anybody help me why this things happening and how to solve it ?
I'm just guessing, but I think Charset.decode(ByteBuffer)
fails when it tries to allocate a huge CharBuffer
for you behind the scenes. Again, this is just a hunch, but the decode
method only decodes bytes from the buffer's current position up to its limit, so you can do something like this.
ByteBuffer buffer = ...
Charset charset = ...
buffer.position(0);
buffer.limit(100);
System.out.println(charset.decode(buffer));
The capacity (in characters) of the CharBuffer
returned by the decode
method will be 100.
(On a side note, I think your second attempt gives erroneous output, because you didn't use a particular character set to decode your CharBuffer
.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.