使用Java中的特定编码写入文件

Question

This might be related to my previous question (on how to convert "fÃ¶r" to "för") 这可能与我之前的问题有关（如何将“fÃr”转换为“för”）

So I have a file that I create in my code. 因此，我有一个在代码中创建的文件。 Right now I create it by the following code: 现在，我通过以下代码创建它：

FileWriter fwOne = new FileWriter(wordIndexPath);
BufferedWriter wordIndex = new BufferedWriter(fwOne);

followed by a few 其次是几个

wordIndex.write(wordBuilder.toString()); //that's a StringBuilder

ending (after a while-loop) with a （在while循环之后）以a结尾

wordIndex.close();

Now the problem is later on this file is huge and I want (need) to jump in it without going through the entire file. 现在问题出在这个文件的后面，这是巨大的，我希望（需要）跳过它而不浏览整个文件。 The seek(long pos) method of RandomAccessFile lets me do this. 我可以执行RandomAccessFile的seek(long pos)方法。

Here's my problem : The characters in the file I've created seem to be encoded with UTF-8 and the only info I have when I seek is the character-position I want to jump to. 这是我的问题 ：我创建的文件中的字符似乎是用UTF-8编码的，而当我查找时唯一的信息就是我想跳转到的字符位置。 seek(long pos) on the other hand jumps in bytes, so I don't end up in the right place since an UTF-8 character can be more than one byte. 另一方面， seek(long pos)以字节为单位跳，所以我不会以正确的位置结束，因为UTF-8字符可以超过一个字节。

Here's my question : Can I, when I write the file, write it in ISO-8859-15 instead (where a character is a byte)? 这是我的问题 ：写文件时，我可以改用ISO-8859-15（字符是字节）写吗？ That way the seek(long pos) will get me in the right position. 这样， seek(long pos)位置seek(long pos)将使我处于正确的位置。 Or should I instead try to use an alternative to RandomAccessFile (is there an alternative where you can jump to a character-position?) 还是我应该尝试使用替代RandomAccessFile的替代方法（是否存在可以跳转到字符位置的替代方法？）

Answer 1

Now first the worrisome. 现在首先令人担忧。 FileWriter and FileReader are old utility classes, that use the default platform settings on that computer. FileWriter和FileReader是旧的实用程序类，它们使用该计算机上的默认平台设置。 Run elsewhere that code will give a different file, will not be able to read a file from another spot. 在其他地方运行，该代码将提供另一个文件，将无法从其他位置读取文件。

ISO-8859-15 is a single byte encoding. ISO-8859-15是单字节编码。 But java holds text in Unicode, so it can combine all scripts. 但是Java将文本保存为Unicode，因此可以合并所有脚本。 And char is UTF-16. char是UTF-16。 In general a char index will not be a byte index, but in your case it probably works. 通常，char索引不会是字节索引，但是在您的情况下它可能会起作用。 But the line break might be one \\n or two \\r\\n chars/bytes - platform dependently. 但是换行符可能是一个\\n或两个\\r\\n字符/字节-取决于平台。

Re 回覆

Personally I think UTF-8 is well established, and it is easier to use: 就我个人而言，我认为UTF-8已经很成熟，并且更易于使用：

byte[] bytes = string.getBytes(StandardCharsets.UTF_8);
string = new String(bytes, StandardCharsets.UTF_8);

That way all special quotes, euro, and so on will always be available. 这样，所有特殊报价，欧元等等都将始终可用。

At least specify the encoding: 至少指定编码：

Files.newBufferedWriter(file.toPath(), "ISO-8859-15");

使用Java中的特定编码写入文件

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-09-01 08:58:43

使用Java中的特定编码写入文件

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-09-01 08:58:43

解决方案1
4 已采纳 2016-09-01 08:58:43