简体   繁体   English

如何让eclipse在unicode中打印出奇怪的字符?

[英]How do I make eclipse print out weird characters in unicode?

So I'm trying to make my program output a text file with a list of names. 所以我试图让我的程序输出一个带有名单列表的文本文件。 Some of the names have weird characters, such as Åström. 一些名字有奇怪的字符,如Åström。

I have grabbed these list of names from a webpage that is encoded in "UTF-8", or at least I'm pretty sure it does because the page source says 我已经从以“UTF-8”编码的网页中抓取了这些名称列表,或者至少我很确定它的确如此,因为页面来源说

" meta http-equiv="Content-Type" content="text/html; “meta http-equiv =”Content-Type“content =”text / html; charset=UTF-8" / " charset = UTF-8“/”

This is what I've tried so far: 这是我到目前为止所尝试的:

public static void write(List<String> list) throws IOException  {
        Writer out = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
        try {
            for (int i=0;i<list.size();i++) {
                try {
                    byte[] utf8Bytes = list.get(i).getBytes("UTF-8");
                    out.write(new String(utf8Bytes, "UTF-8"));
                } catch (UnsupportedEncodingException e) {
                    e.printStackTrace();
                }

                out.write(System.getProperty("line.separator"));

            }
        }
        finally {
        out.close();
        }
    }

and I'm a little confused as to why it's not working. 我有点困惑为什么它不起作用。 The output I get is "Ã…ström", which is very weird. 我得到的输出是“Ã...ström”,这非常奇怪。

Can someone please point me in the right direction? 有人可以指点我正确的方向吗? Thanks! 谢谢!

And on another unrelated note, is there an easier way to write a new line to a text file besides the clunky 在另一个不相关的说明中,除了笨重之外,是否有更简单的方法将新行写入文本文件

out.write(System.getProperty("line.separator")); out.write(System.getProperty( “line.separator”));

that I have? 我有? I saw that online somewhere and it works, but I was just wondering if there was a cleaner way. 我在网上看到了它并且它有效,但我只是想知道是否有更清洁的方式。

Eclipse > Preferences > General > Workspace > Text file encoding为UTF-8。

The content is indeed in UTF-8 and it appears OK if printed to the console. 内容确实是UTF-8,如果打印到控制台,它似乎没问题。 What may be causing the problem is the decoding and encoding of the string which is unnecessary. 可能导致该问题的是字符串的解码和编码,这是不必要的。 Instead of an OutputStreamWriter try using a java.io.PrintWriter. 而不是OutputStreamWriter尝试使用java.io.PrintWriter。 It has the println methods that print out the string with the system line separator at the end. 它有println方法,最后用系统行分隔符打印出字符串。 It would look something like: 它看起来像:

printStream.println(list.get(i));

Also, when opening the file to see it try using a browser. 此外,打开文件以查看它时尝试使用浏览器。 They allow you to choose the encoding after opening it so you can try several encodings quickly to see what is being really used. 它们允许您在打开后选择编码,以便您可以快速尝试多种编码以查看实际使用的内容。

Notepad is not a particularly feature rich editor. 记事本不是一个功能特别丰富的编辑器。 It will attempt to guess the document encoding, sometimes with unexpected results . 它将尝试猜测文档编码,有时会出现意外结果 "Plain text" documents don't carry any metadata about their encoding which gives them certain limitations. “纯文本”文档不带有关于其编码的任何元数据,这给它们带来了某些限制。 Windows apps (Notepad included) often rely on the byte-order-mark (U+FEFF or "\" in Java strings) to determine if the encoding is a Unicode format. Windows应用程序(包括记事本)通常依赖字节顺序标记 (Java字符串中的U + FEFF或"\" )来确定编码是否为Unicode格式。 That might help out Notepad; 这可能有助于记事本; it's going to be useless for your web page problem. 它会对你的网页问题毫无用处。

The HTML 4 spec defines how output encoding should be set . HTML 4规范定义了如何设置输出编码 You should set the Content-Type HTTP header in addition to specifying the meta encoding. 除了指定元编码之外,还应设置Content-Type HTTP标头。

You don't mention what you're using in your web app. 您没有在网络应用中提及您正在使用的内容。 A servlet should set the content type setContentType("text/html; charset=UTF-8") ; servlet应该设置内容类型setContentType("text/html; charset=UTF-8") ; a JSP should use the page directive to do the same. JSP应该使用page指令来做同样的事情。 Other view technologies will provide similar mechanisms. 其他视图技术将提供类似的机制。


byte[] utf8Bytes = list.get(i).getBytes("UTF-8");
out.write(new String(utf8Bytes, "UTF-8"));

This code performs some useless operations; 此代码执行一些无用的操作; it transcodes character data from UTF-16 to UTF-8, then back from UTF-8 to UTF-16, then writes data to a Writer (which will transcode the UTF-16 to UTF-8 again). 它将字符数据从UTF-16转码为UTF-8,然后从UTF-8转换回UTF-16,然后将数据写入Writer (它将UTF-16再次转码为UTF-8)。 This code is equivalent: 这段代码是等效的:

String str = list.get(i);
out.write(str);

Use a PrintWriter to get newline support. 使用PrintWriter获取换行支持。


You can read more about character encoding in Java here , here and here . 您可以在此处此处此处阅读有关Java中字符编码的更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM