简体   繁体   中英

Creating a UTF-8 File in Java

I'm currently making a program that saves Chinese Words onto a text file. I create the text file in java, and then try and write words to it. However, the text file I create is never encoded in UTF-8. This is the code I'm using, why doesn't it work? I was told that there was a bug inherent in Java but I have no idea how to get around it.

public void createFile(String name) {
    try {
        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
        new FileOutputStream(name +".txt"), "UTF-8"));
        out.write("");
    }
    catch(java.io.IOException e) {
        System.err.println("Something went wrong.");
    }
}

Also, do I have another option aside from text files with which I could still use UTF encoding?

Also I'm testing its encoding by opening the TextEdit application and trying to write Chinese characters. Could this also be a problem?

First, files themselves don't have encodings. They're a bunch of 0s and 1s. If you write "asdf" in utf-8, it's completely indistinguishable from plain old ascii7.

If you were writing in, say, utf-16, then the byte-order mark (BOM) would be a pretty clear indication that it's written in utf-16, even with an empty string, but utf-8 does not require such a marker to be present.

Therefore, your editor has no way of knowing that this file is supposed to be written in utf-8. You could write utf-8's BOM to your file by:

out.write(0xEFBBBF);

However, in this case, out would have to be an OutputStream, such as the FileOutputStream. (BufferedWriter and OutputStreamWriter do not accept byte arrays for input.)

Try the following code. It worked for me. The file was written out as UTF-8. I was able to open it with Notepad++, which verified that the encoding was UTF-8. The characters encoded correctly. I got the characters from http://www.khngai.com/chinese/charmap/tbluni.php .

package testutf8;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import java.io.Writer;

public class TestUTF8 {
  public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException, IOException {
    String str = "Unicode Character Map, 0x4E00 - 0x4FFF\n" +
                 "4E00   一   丁   丂   七   丄   丅   丆   万   丈   三   上   下   丌   不   与   丏\n" +
                 "4E10   丐   丑   丒   专   且   丕   世   丗   丘   丙   业   丛   东   丝   丞   丟\n" +
                 "4E20   丠   両   丢   丣   两   严   並   丧   丨   丩   个   丫   丬   中   丮   丯\n" +
                 "4E30   丰   丱   串   丳   临   丵   丶   丷   丸   丹   为   主   丼   丽   举   丿\n" +
                 "4E40   乀   乁   乂   乃   乄   久   乆   乇   么   义   乊   之   乌   乍   乎   乏\n" +
                 "4E50   乐   乑   乒   乓   乔   乕   乖   乗   乘   乙   乚   乛   乜   九   乞   也\n" +
                 "4E60   习   乡   乢   乣   乤   乥   书   乧   乨   乩   乪   乫   乬   乭   乮   乯\n" +
                 "4E70   买   乱   乲   乳   乴   乵   乶   乷   乸   乹   乺   乻   乼   乽   乾   乿\n" +
                 "4E80   亀   亁   亂   亃   亄   亅   了   亇   予   争   亊   事   二   亍   于   亏\n" +
                 "4E90   亐   云   互   亓   五   井   亖   亗   亘   亙   亚   些   亜   亝   亞   亟\n" +
                 "4EA0   亠   亡   亢   亣   交   亥   亦   产   亨   亩   亪   享   京   亭   亮   亯\n" +
                 "4EB0   亰   亱   亲   亳   亴   亵   亶   亷   亸   亹   人   亻   亼   亽   亾   亿\n" +
                 "4EC0   什   仁   仂   仃   仄   仅   仆   仇   仈   仉   今   介   仌   仍   从   仏\n" +
                 "4ED0   仐   仑   仒   仓   仔   仕   他   仗   付   仙   仚   仛   仜   仝   仞   仟\n" +
                 "4EE0   仠   仡   仢   代   令   以   仦   仧   仨   仩   仪   仫   们   仭   仮   仯\n" +
                 "4EF0   仰   仱   仲   仳   仴   仵   件   价   仸   仹   仺   任   仼   份   仾   仿\n" +
                 "4F00   伀   企   伂   伃   伄   伅   伆   伇   伈   伉   伊   伋   伌   伍   伎   伏\n" +
                 "4F10   伐   休   伒   伓   伔   伕   伖   众   优   伙   会   伛   伜   伝   伞   伟\n" +
                 "4F20   传   伡   伢   伣   伤   伥   伦   伧   伨   伩   伪   伫   伬   伭   伮   伯\n" +
                 "4F30   估   伱   伲   伳   伴   伵   伶   伷   伸   伹   伺   伻   似   伽   伾   伿\n" +
                 "4F40   佀   佁   佂   佃   佄   佅   但   佇   佈   佉   佊   佋   佌   位   低   住\n" +
                 "4F50   佐   佑   佒   体   佔   何   佖   佗   佘   余   佚   佛   作   佝   佞   佟\n" +
                 "4F60   你   佡   佢   佣   佤   佥   佦   佧   佨   佩   佪   佫   佬   佭   佮   佯\n" +
                 "4F70   佰   佱   佲   佳   佴   併   佶   佷   佸   佹   佺   佻   佼   佽   佾   使\n" +
                 "4F80   侀   侁   侂   侃   侄   侅   來   侇   侈   侉   侊   例   侌   侍   侎   侏\n" +
                 "4F90   侐   侑   侒   侓   侔   侕   侖   侗   侘   侙   侚   供   侜   依   侞   侟\n" +
                 "4FA0   侠   価   侢   侣   侤   侥   侦   侧   侨   侩   侪   侫   侬   侭   侮   侯\n" +
                 "4FB0   侰   侱   侲   侳   侴   侵   侶   侷   侸   侹   侺   侻   侼   侽   侾   便\n" +
                 "4FC0   俀   俁   係   促   俄   俅   俆   俇   俈   俉   俊   俋   俌   俍   俎   俏\n" +
                 "4FD0   俐   俑   俒   俓   俔   俕   俖   俗   俘   俙   俚   俛   俜   保   俞   俟\n" +
                 "4FE0   俠   信   俢   俣   俤   俥   俦   俧   俨   俩   俪   俫   俬   俭   修   俯\n" +
                 "4FF0   俰   俱   俲   俳   俴   俵   俶   俷   俸   俹   俺   俻   俼   俽   俾   俿\n";

    FileOutputStream fos = new FileOutputStream("tmp.txt");
    Writer           out = new OutputStreamWriter(fos, "UTF-8");
    out.write(str);
    out.close();
  }
}

This may be a TextEdit usage issue.

If there are no non-ASCII characters in the file you're writing, TextEdit's algorithm to determine encoding will likely land on ASCII or a Latin-1 variant.

You can specify a text file's encoding in the File->Open dialog. I'm not sure whether TextEdit remembers this decision on future double-clicks of this file.

Try UTF-8 instead of UTF8. This might solve your problem.

I noticed that you didn't close your stream:

out.close();

Of course you didn't include the code that wrote the actual characters either...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM