简体   繁体   English

从Java中的二进制文件读取字符串

[英]Read string from binary file in Java

I have read any page I found on the web, but non of them work for me. 我已经阅读了在网络上找到的任何页面,但是没有一个页面适合我。

I have a binary file which is created with a C code. 我有一个用C代码创建的二进制文件。 I also have the C reader for this binary file. 对于该二进制文件,我也有C阅读器。 I need to write the java reader for this binary file. 我需要为该二进制文件编写Java阅读器。

In C code, the following command reads one string of size 'b * max_w' and a character. 在C代码中,以下命令读取一个大小为'b * max_w'的字符串和一个字符。

fscanf(f, "%s%c", &vocab[b * max_w], &ch);

In java I read the binary file, 在Java中,我读取了二进制文件,

FileInputStream fis = new FileInputStream(filename);  
BufferedInputStream bin = new BufferedInputStream(fis);

And then read Bytes and convert it into string. 然后读取字节并将其转换为字符串。

for(int j = 0; j < 200; j++) {
     int size = 2; // char is 2 bytes  
     byte[] tempId3 = new byte[size];
     bin.read(tempId3, 0, size); 
     String id3 = new String ( tempId3 ); 
     System.out.println( " id = " + id3 );                
}

But the output is a bunch of nonsense. 但是输出是一堆废话。 Am I doing something wrong? 难道我做错了什么? Can I do better? 我可以做得更好吗?

Edit: The working C snippet from here from is : 编辑:这里开始工作的C代码段是:

fscanf(f, "%lld", &words);
  fscanf(f, "%lld", &size);
  vocab = (char *)malloc((long long)words * max_w * sizeof(char));
  for (a = 0; a < N; a++) bestw[a] = (char *)malloc(max_size * sizeof(char));

Here is what I have: 这是我所拥有的:

FileInputStream fis = new FileInputStream(filename);  
BufferedInputStream bin = new BufferedInputStream(fis);

int length = 1; 

System.out.println("1st: "); 
byte[] tempId = new byte[8];
bin.read(tempId, 0, 8); 
String id = new String ( tempId, "US-ASCII" ); 
System.out.println( " out = " + id ); 

System.out.println("2nd: "); 
int size1 = 8; 
byte[] tempId2 = new byte[size1];
bin.read(tempId2, 0, size1); 
String id2 = new String ( tempId2, "US-ASCII"); 
System.out.println( " out = " + id2 ); 



for(int j = 0; j < 20; j++) {
     int size = 2; 
     byte[] tempId3 = new byte[size];
     bin.read(tempId3, 0, size); 
     String id3 = new String ( tempId3, "US-ASCII" ); 
     System.out.println( " out = " + id3 );                  
}

The output that I see is the following; 我看到的输出如下: except the first two 'long' numbers, the rest is nonsense (expected to be characters ). 除了前两个“长”数字外,其余都是废话(预计为字符)。

产量

PS. PS。 The C code is here (line 44-60 is the part which reads the binary file) 这里是C代码(第44-60行是读取二进制文件的部分)

May be using Reader you can get what you need? 也许使用Reader可以得到您所需要的? Using InputStream you are reading binary data, Readers are for strings. 使用InputStream,您正在读取二进制数据,而Readers用于字符串。

You can try to use a constructor like this one , and try different charset. 您可以尝试使用像这样的构造函数,并尝试其他字符集。 Because a java string is encoded in UTF-16, so one character is encoded in 2 bytes, it could be why it doesn't work. 由于Java字符串以UTF-16编码,因此一个字符以2个字节编码,这可能就是为什么它不起作用的原因。 Try with US-ASCII for example. 例如,尝试使用US-ASCII。

String is in unicode in Java. 字符串是Java中的unicode。 You have to take care of that. 您必须照顾好这一点。 What is the encoding you use in your binary file? 您在二进制文件中使用的编码是什么?

  String id3 =   new String(tempId3, "US-ASCII");

As it was said in other comments try to use a String constructor with character encoding. 正如在其他注释中所说的那样,请尝试使用具有字符编码的String构造函数。 That is: 那是:

String id3 = new String(tempId3, Charsets.US_ASCII);

Or: 要么:

String id3 = new String(tempId3, "US_ASCII");

Other lines may remain untouched. 其他行可能保持不变。

In the C code you have posted there is no actual reading of characters. 在您发布的C代码中,没有实际读取字符。 There is only memory allocation for further scanning process. 仅内存分配用于进一步的扫描过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM