简体   繁体   中英

Reading a character at random place from file in java?

When reading from a file using readChar() in RandomAccessFile class, unexpected output comes. Instead of the desired character ? is displayed.

package tesr;
import java.io.RandomAccessFile;
import java.io.IOException;

public class Test {

    public static void main(String[] args)  {
        try{
            RandomAccessFile f=new RandomAccessFile("c:\\ankit\\1.txt","rw");
            f.seek(0);
            System.out.println(f.readChar());
        }
        catch(IOException e){
            System.out.println("dkndknf");
        }
    // TODO Auto-generated method stub

}

}

You probably intended readByte . Java char is UTF-16BE, a 2 bytes Unicode representation, and on random binary data very often not representable, no correct UTF-16BE or a half "surrogate" - part of a combination of two char forming one Unicode code point. Java represents a failed conversion in your case as question mark.

If you know in what encoding the file is in, then for a single byte encoding it is simple:

byte b = in.readByte();
byte[] bs = new byte[] { b };
String s = new String(bs, "Cp1252"); // Some single byte encoding

For the variable multi-byte UTF-8 it is also simple to identify a sequence of bytes:

  • single byte when high bit = 0
  • otherwise a continuation byte when high bits 10
  • otherwise a starting byte (with some special cases) telling the number of bytes by its high bits.

For UTF-16LE and UTF-16BE the file positions must be a multiple of 2 and 2 bytes long.

byte[] bs = new byte[2];
in.read(bs);
String s = new String(bs, StandardCharsets.UTF_16LE);

You almost certainly have a character encoding problem. It is not possible to simply read characters from a file. What must be done is that an appropriate sequence of bytes are read, then those bytes are interpreted according to a character encoding scheme to translate them to a character. When you want to read a file as text, Java must be told, perhaps implicitly, which character encoding to use.

If you tell Java the wrong encoding you will get gibberish. If you pick an arbitrary point in a file and start reading, and that location is not the start of the encoding of a character, you will get gibberish. One or both of those has happened in your case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM