简体   繁体   English

如何在Java中读取Unicode编码的文件

[英]how to read a unicode encoded file in java

I am trying to read a file that has been encoded in Unicode(I used Editplus to find out its encoding.) 我正在尝试读取已以Unicode编码的文件(我使用Editplus找出其编码。)

I am using the following code:- 我正在使用以下代码:-

InputStream inStream = new FileInputStream(logFile);
InputStreamReader streamReader = new InputStreamReader(inStream, "Unicode");
final BufferedReader reader = new BufferedReader(streamReader);

But it does not read the file correctly. 但是它无法正确读取文件。 When I tried "UTF-8" it read the file but the output produced contained a space after every character. 当我尝试“ UTF-8”时,它会读取文件,但输出结果在每个字符后都包含一个空格。

I need to read a file and display its contents in a JList. 我需要读取一个文件并将其内容显示在JList中。 I searched and got to know that 我搜寻了一下才知道

Unicode characters use 2 bytes. Unicode字符使用2个字节。 With ASCII text every other byte will be a binary 0 which will display as a ? 对于ASCII文本,其他每个字节将是二进制0,将显示为? or square with most text editors. 或与大多数文本编辑器对齐。

This is similar to what is happening with me. 这类似于我正在发生的事情。 I do not have much knowledge about encoding. 我对编码没有太多了解。

Any help would be really appreciated. 任何帮助将非常感激。

I'm not sure what endianness "Unicode" gives, but you should try "UTF-16BE" and "UTF-LE" - obviously BE is Big Endian, and LE is Little Endian. 我不确定“ Unicode”的字节序如何,但是您应该尝试“ UTF-16BE”和“ UTF-LE”-显然BE是Big Endian,LE是Little Endian。 (Just which byte comes first in each 16-bit code unit.) (在每个16位代码单元中,哪个字节排在第一位。)

(I've just read that "UTF-16" defaults to big endian, so I suspect "Unicode" does too... that would mean "UTF-16LE" is more likely to work.) (我刚刚读到“ UTF-16”默认为大字节序,所以我怀疑“ Unicode”也是如此……这意味着“ UTF-16LE”更有可能工作。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM