简体   繁体   English

Ruby 1.9.2字符编码:无效的多字节字符:/?/

[英]Ruby 1.9.2 Character Encoding: invalid multibyte character: /?/

I'm trying to understand why this snippet of code does not work in Ruby 1.9.2 I'm also trying to figure out how it should be changed to be made to work. 我试图理解为什么这段代码在Ruby 1.9.2中不起作用我也试图弄清楚它应该如何改变以使其工作。 Here is the snippet: 这是片段:

ruby-1.9.2-p290 :009 > str = "hello world!"
 => "hello world!" 
ruby-1.9.2-p290 :010 > str.gsub("\223","")
RegexpError: invalid multibyte character: /?/
    from (irb):10:in `gsub'

Your ruby is in UTF-8 mode but "\\223" is not a valid UTF-8 string. 您的ruby处于UTF-8模式,但"\\223"不是有效的UTF-8字符串。 When you're in UTF-8, any byte with the eighth bit set means that you're within a multi-byte character and you need to keep reading more bytes to get the full character; 当你使用UTF-8时,任何设置了第8位的字节意味着你在一个多字节字符内,你需要继续读取更多的字节来获得完整的字符; that means that "\\223" is just part of a UTF-8 encoded character, hence your error. 这意味着, "\\223"只是一个UTF-8编码字符的一部分 ,因此,您的错误。

0223 and 0224 (147 and 148 decimal) are "smart" quotes in the Windows-1252 character set but Windows-1252 isn't UTF-8. 0223和0224(147和148十进制)是Windows-1252字符集中的“智能”引号,但Windows-1252不是UTF-8。 In UTF-8 you want "\“" and "\”" for the quotes: 在UTF-8中,您需要"\“""\”"作为引号:

>> puts "\u201c"
“
>> puts "\u201d"
”

So if you're trying to strip out the quotes then you probably want one of these: 因此,如果你试图删除引号,那么你可能想要其中一个:

str.gsub("\u201c", "").gsub("\u201d", "")
str.gsub(/[\u201c\u201d]/, '')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM