[英]Read string from binary file in Java
I have read any page I found on the web, but non of them work for me. 我已经阅读了在网络上找到的任何页面,但是没有一个页面适合我。
I have a binary file which is created with a C code. 我有一个用C代码创建的二进制文件。 I also have the C reader for this binary file.
对于该二进制文件,我也有C阅读器。 I need to write the java reader for this binary file.
我需要为该二进制文件编写Java阅读器。
In C code, the following command reads one string of size 'b * max_w' and a character. 在C代码中,以下命令读取一个大小为'b * max_w'的字符串和一个字符。
fscanf(f, "%s%c", &vocab[b * max_w], &ch);
In java I read the binary file, 在Java中,我读取了二进制文件,
FileInputStream fis = new FileInputStream(filename);
BufferedInputStream bin = new BufferedInputStream(fis);
And then read Bytes and convert it into string. 然后读取字节并将其转换为字符串。
for(int j = 0; j < 200; j++) {
int size = 2; // char is 2 bytes
byte[] tempId3 = new byte[size];
bin.read(tempId3, 0, size);
String id3 = new String ( tempId3 );
System.out.println( " id = " + id3 );
}
But the output is a bunch of nonsense. 但是输出是一堆废话。 Am I doing something wrong?
难道我做错了什么? Can I do better?
我可以做得更好吗?
Edit: The working C snippet from here from is : 编辑:从这里开始工作的C代码段是:
fscanf(f, "%lld", &words);
fscanf(f, "%lld", &size);
vocab = (char *)malloc((long long)words * max_w * sizeof(char));
for (a = 0; a < N; a++) bestw[a] = (char *)malloc(max_size * sizeof(char));
Here is what I have: 这是我所拥有的:
FileInputStream fis = new FileInputStream(filename);
BufferedInputStream bin = new BufferedInputStream(fis);
int length = 1;
System.out.println("1st: ");
byte[] tempId = new byte[8];
bin.read(tempId, 0, 8);
String id = new String ( tempId, "US-ASCII" );
System.out.println( " out = " + id );
System.out.println("2nd: ");
int size1 = 8;
byte[] tempId2 = new byte[size1];
bin.read(tempId2, 0, size1);
String id2 = new String ( tempId2, "US-ASCII");
System.out.println( " out = " + id2 );
for(int j = 0; j < 20; j++) {
int size = 2;
byte[] tempId3 = new byte[size];
bin.read(tempId3, 0, size);
String id3 = new String ( tempId3, "US-ASCII" );
System.out.println( " out = " + id3 );
}
The output that I see is the following; 我看到的输出如下: except the first two 'long' numbers, the rest is nonsense (expected to be characters ).
除了前两个“长”数字外,其余都是废话(预计为字符)。
PS. PS。 The C code is here (line 44-60 is the part which reads the binary file)
这里是C代码(第44-60行是读取二进制文件的部分)
May be using Reader you can get what you need? 也许使用Reader可以得到您所需要的? Using InputStream you are reading binary data, Readers are for strings.
使用InputStream,您正在读取二进制数据,而Readers用于字符串。
You can try to use a constructor like this one , and try different charset. 您可以尝试使用像这样的构造函数,并尝试其他字符集。 Because a java string is encoded in UTF-16, so one character is encoded in 2 bytes, it could be why it doesn't work.
由于Java字符串以UTF-16编码,因此一个字符以2个字节编码,这可能就是为什么它不起作用的原因。 Try with US-ASCII for example.
例如,尝试使用US-ASCII。
String is in unicode in Java. 字符串是Java中的unicode。 You have to take care of that.
您必须照顾好这一点。 What is the encoding you use in your binary file?
您在二进制文件中使用的编码是什么?
String id3 = new String(tempId3, "US-ASCII");
As it was said in other comments try to use a String constructor with character encoding. 正如在其他注释中所说的那样,请尝试使用具有字符编码的String构造函数。 That is:
那是:
String id3 = new String(tempId3, Charsets.US_ASCII);
Or: 要么:
String id3 = new String(tempId3, "US_ASCII");
Other lines may remain untouched. 其他行可能保持不变。
In the C code you have posted there is no actual reading of characters. 在您发布的C代码中,没有实际读取字符。 There is only memory allocation for further scanning process.
仅内存分配用于进一步的扫描过程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.