简体   繁体   English

使用Java中的特定编码写入文件

[英]Write to a file with a specific encoding in Java

This might be related to my previous question (on how to convert "för" to "för") 可能与我之前的问题有关 (如何将“fÃr”转换为“för”)

So I have a file that I create in my code. 因此,我有一个在代码中创建的文件。 Right now I create it by the following code: 现在,我通过以下代码创建它:

FileWriter fwOne = new FileWriter(wordIndexPath);
BufferedWriter wordIndex = new BufferedWriter(fwOne);

followed by a few 其次是几个

wordIndex.write(wordBuilder.toString()); //that's a StringBuilder

ending (after a while-loop) with a (在while循环之后)以a结尾

wordIndex.close();

Now the problem is later on this file is huge and I want (need) to jump in it without going through the entire file. 现在问题出在这个文件的后面,这是巨大的,我希望(需要)跳过它而不浏览整个文件。 The seek(long pos) method of RandomAccessFile lets me do this. 我可以执行RandomAccessFileseek(long pos)方法。

Here's my problem : The characters in the file I've created seem to be encoded with UTF-8 and the only info I have when I seek is the character-position I want to jump to. 这是我的问题 :我创建的文件中的字符似乎是用UTF-8编码的,而当我查找时唯一的信息就是我想跳转到的字符位置。 seek(long pos) on the other hand jumps in bytes, so I don't end up in the right place since an UTF-8 character can be more than one byte. 另一方面, seek(long pos)以字节为单位跳,所以我不会以正确的位置结束,因为UTF-8字符可以超过一个字节。

Here's my question : Can I, when I write the file, write it in ISO-8859-15 instead (where a character is a byte)? 这是我的问题 :写文件时,我可以改用ISO-8859-15(字符是字节)写吗? That way the seek(long pos) will get me in the right position. 这样, seek(long pos)位置seek(long pos)将使我处于正确的位置。 Or should I instead try to use an alternative to RandomAccessFile (is there an alternative where you can jump to a character-position?) 还是我应该尝试使用替代RandomAccessFile的替代方法(是否存在可以跳转到字符位置的替代方法?)

Now first the worrisome. 现在首先令人担忧。 FileWriter and FileReader are old utility classes, that use the default platform settings on that computer. FileWriter和FileReader是旧的实用程序类,它们使用该计算机上的默认平台设置。 Run elsewhere that code will give a different file, will not be able to read a file from another spot. 在其他地方运行,该代码将提供另一个文件,将无法从其他位置读取文件。

ISO-8859-15 is a single byte encoding. ISO-8859-15是单字节编码。 But java holds text in Unicode, so it can combine all scripts. 但是Java将文本保存为Unicode,因此可以合并所有脚本。 And char is UTF-16. char是UTF-16。 In general a char index will not be a byte index, but in your case it probably works. 通常,char索引不会是字节索引,但是在您的情况下它可能会起作用。 But the line break might be one \\n or two \\r\\n chars/bytes - platform dependently. 但是换行符可能是一个\\n或两个\\r\\n字符/字节-取决于平台。

Re 回覆

Personally I think UTF-8 is well established, and it is easier to use: 就我个人而言,我认为UTF-8已经很成熟,并且更易于使用:

byte[] bytes = string.getBytes(StandardCharsets.UTF_8);
string = new String(bytes, StandardCharsets.UTF_8);

That way all special quotes, euro, and so on will always be available. 这样,所有特殊报价,欧元等等都将始终可用。

At least specify the encoding: 至少指定编码:

Files.newBufferedWriter(file.toPath(), "ISO-8859-15");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM