简体   繁体   English

Ruby 1.9-无效的多字节字符(utf-8)

[英]Ruby 1.9 - Invalid multibyte character (utf-8)

I have a ruby file with only these two lines: 我有一个只有这两行的红宝石文件:

# encoding: utf-8
puts "—"

When I run it with ruby test_enc.rb it fails with: 当我用ruby test_enc.rb运行它时,它失败并显示:

test_enc.rb:2: invalid multibyte char (UTF-8)
test_enc.rb:2: unterminated string meets end of file

I don't know how to properly specify the character code of (emdash), but vim tells me it is 151, Hex 97, Octal 227 . 我不知道如何正确指定字符符号( ),但vim告诉我它是151, Hex 97, Octal 227 It fails the same way with other characters like ã as well, so I doubt it is related specifically to that character. 它也无法与其他字符(如ã以相同的方式失败,因此我怀疑它是否与该字符特别相关。 I am running on Windows XP and the version of ruby I'm using is: 我在Windows XP上运行,我使用的ruby版本是:

ruby 1.9.1p430 (2010-08-16 revision 28998) [i386-mingw32]

I feel like there is something very obvious I am missing here. 我觉得这里有些明显的东西我很想念。 Any ideas? 有任何想法吗?

EDIT: Learned a valuable lesson about assumptions today - specifically assuming your editor IS using UTF-8 without actually checking it. 编辑:今天学习了有关假设的宝贵经验-特别是假设您的编辑器使用的是UTF-8,而没有实际对其进行检查。 Oops! 糟糕!

Thanks for the quick and accurate replies all! 感谢您快速准确的答复所有人!

EDIT AGAIN: The 'setting up vim properly for utf-8' grew too big and wasn't really relevant to this question, so it is now a separate question . 再次编辑: '为utf-8正确设置vim'太大了,与这个问题并没有真正的关系,因此现在是一个单独的问题

Given that Ruby is explicitly calling your attention to UTF-8, I strongly suspect that you haven't actually written out a UTF-8 file to start with. 鉴于Ruby明确要求您注意UTF-8,因此我强烈怀疑您实际上并没有写出UTF-8文件。 Make sure that Vim (or whatever text editor you're using to create the file) is really set to write out UTF-8. 保证Vim(或者你使用任何文本编辑器创建该文件) 确实是设置写出来UTF-8。

Note that in UTF-8, any non-ASCII character will be represented by multiple bytes, not a single byte as you've described from the Vim diagnostics. 请注意,在UTF-8中,任何非ASCII字符都将由多个字节表示,而不是Vim诊断程序中描述的单个字节。 I'd recommend using a binary file editor (or dump, or whatever) to really show what's in the text file though. 我建议使用二进制文件编辑器(或转储,或其他方式)来真正显示文本文件中的内容。 Something that doesn't already have some preconceived notion of the encoding - something that isn't even trying to think of it as a text file. 尚没有某种先入为主的编码概念的东西-甚至没有试图将其视为文本文件的东西。

Notepad lets you write out a file in UTF-8, so you might want to try that just to see what happens. 记事本允许您以UTF-8格式写出文件,因此您可能想尝试一下以查看会发生什么。 (I don't have Ruby installed myself, otherwise I'd try it for you.) (我自己没有安装Ruby,否则我会为您尝试一下。)

Your file is in latin1. 您的文件在latin1中。 Ruby is right. 露比是对的

emdash would be encoded on two bytes not one in UTF-8. emdash将被编码为两个字节,而不是UTF-8中的一个字节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM