Ruby脚本中的Unicode字符？

Question

I would like to write a Ruby script which writes Japanese characters to the console. 我想写一个Ruby脚本，它将日文字符写入控制台。 For example: 例如：

puts "こんにちは・今日は"

However, I get an exception when running it: 但是，运行它时会出现异常：

jap.rb:1: Invalid char `\377' in expression
jap.rb:1: Invalid char `\376' in expression

Is it possible to do? 有可能吗？ I'm using Ruby 1.8.6. 我正在使用Ruby 1.8.6。

Answer 1

You've saved the file in the UTF-16LE encoding, the one Windows misleadingly calls “Unicode”. 您已将文件保存为UTF-16LE编码，一个Windows误导性地称为“Unicode”。 This encoding is generally best avoided because it's not an ASCII-superset: each code unit is stored as two bytes, with ASCII characters having the other byte stored as \\0 . 通常最好避免使用此编码，因为它不是ASCII超集：每个代码单元存储为两个字节，ASCII字符的另一个字节存储为\\0 。 This will confuse an awful lot of software; 这会混淆很多软件; it is unusual to use UTF-16 for file storage. 使用UTF-16进行文件存储是不常见的。

What you are seeing with \\377 and \\376 (octal for \\xFF and \\xFE ) is the U+FEFF Byte Order Mark sequence put at the front of UTF-16 files to distinguish UTF-16LE from UTF-16BE. 您所看到的\\377和\\376 （八进制为\\xFF和\\xFE ）是U + FEFF字节顺序标记序列放在UTF-16文件的前面，以区分UTF-16LE和UTF-16BE。

Ruby 1.8 is totally byte-based; Ruby 1.8完全基于字节; it makes no attempt to read Unicode characters from a script. 它不会尝试从脚本中读取Unicode字符。 So you can only save source files in ASCII-compatible encodings. 因此，您只能以ASCII兼容编码保存源文件。 Normally, you'd want to save your files as UTF-8 (without BOM; the UTF-8 faux-BOM is another great Microsoft innovation that breaks everything). 通常，您希望将文件保存为UTF-8（没有BOM; UTF-8虚拟BOM是另一项伟大的Microsoft创新，可以破坏所有内容）。 This'd work great for scripts on the web producing UTF-8 pages. 这对于生成UTF-8页面的Web上的脚本非常有用。

And if you wanted to be sure the source code would be tolerant of being saved in any ASCII-compatible encoding, you could encode the string to make it more resilient (if less readable): 如果您想确保源代码能够容忍以任何与ASCII兼容的编码保存，您可以对字符串进行编码以使其更具弹性（如果不太可读）：

puts "\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf\xe3\x83\xbb\xe4\xbb\x8a\xe6\x97\xa5\xe3\x81\xaf"

However! 然而！ Writing to the console is itself a big problem. 写入控制台本身就是一个大问题。 What encoding is used to send characters to the console varies from platform to platform. 使用什么编码将字符发送到控制台因平台而异。 On Linux or OS X, it's UTF-8. 在Linux或OS X上，它是UTF-8。 On Windows, it's a different encoding for every installation locale (as selected on “Language for non-Unicode applications” in the “Regional and Language Options” control panel entry), but it's never UTF-8. 在Windows上，它是每个安装区域设置的不同编码（在“区域和语言选项”控制面板条目中的“非Unicode应用程序的语言”中选择），但它从不是 UTF-8。 This setting is—again, misleadingly—known as the ANSI code page. 此设置再次被误导地称为ANSI代码页。

So if you are using a Japanese Windows install, your console encoding will be Windows code page 932 (a variant of Shift-JIS). 因此，如果您使用的是日语Windows安装，则您的控制台编码将是Windows代码页932（Shift-JIS的变体）。 If that's the case, you can save the text file from a text editor using “ANSI” or explicitly “Japanese cp932”, and when you run it in Ruby you'll get the right characters out. 如果是这种情况，您可以使用“ANSI”或显式“日语cp932”从文本编辑器中保存文本文件，当您在Ruby中运行它时，您将获得正确的字符。 Again, if you wanted to make the source withstand misencoding, you could escape the string in cp932 encoding: 同样，如果你想使源代码能够承受错误编码，你可以在cp932编码中转义字符串：

puts "\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd\x81E\x8d\xa1\x93\xfa\x82\xcd"

But if you run it on a machine in another locale, it'll produce different characters. 但是如果你在另一个语言环境中的机器上运行它，它将产生不同的字符。 You will be unable to write Japanese to the default console from Ruby on a Western Windows installation (code page 1252). 在Western Windows安装（代码页1252）上，您将无法从Ruby将日语写入默认控制台。

(Whilst Ruby 1.9 improves Unicode handling a lot, it doesn't change anything here. It's still a bytes-based application using the C standard library IO functions, and that means it is limited to Windows's local code page.) （虽然Ruby 1.9大大改进了Unicode处理，但它并没有改变任何东西。它仍然是一个使用C标准库IO函数的基于字节的应用程序，这意味着它仅限于Windows的本地代码页。）

Ruby脚本中的Unicode字符？

问题描述

1 个解决方案

解决方案1
12 已采纳 2010-08-14 16:19:33

Ruby脚本中的Unicode字符？

问题描述

1 个解决方案

解决方案1 12 已采纳 2010-08-14 16:19:33

解决方案1
12 已采纳 2010-08-14 16:19:33