[英]Ruby 2.1.5 - ArgumentError: invalid byte sequence in UTF-8
I'm having trouble with UTF8 chars in Ruby 2.1.5 and Rails 4. 我在Ruby 2.1.5和Rails 4中遇到UTF8字符的麻烦。
The problem is, the data which come from an external service are like that: 问题是,来自外部服务的数据如下:
"first_name"=>"ezgi \xE7enberci"
"last_name" => "\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"
These characters mostly include Turkish alphabet characters like "üğşiçö". 这些字符主要包括土耳其字母,例如“üğşiçö”。 When the application tries to save these data, the errors below occur: 当应用程序尝试保存这些数据时,发生以下错误:
ArgumentError: invalid byte sequence in UTF-8
Mysql2::Error: Incorrect string value
How can I fix this? 我怎样才能解决这个问题?
Ruby thinks you have invalid byte sequences because your strings aren't UTF-8. Ruby认为您的字节序列无效,因为您的字符串不是UTF-8。 For example, using the rchardet gem : 例如,使用rchardet gem :
require 'chardet'
["ezgi \xE7enberci", "\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"].map do str
puts CharDet.detect str
end
#=> [{"encoding"=>"ISO-8859-2", "confidence"=>0.8600826867857209}, {"encoding"=>"windows-1255", "confidence"=>0.5807177322740268}] #=> [{“ encoding” =>“ ISO-8859-2”,“ confidence” => 0.8600826867857209},{“ encoding” =>“ windows-1255”,“ confidence” => 0.5807177322740268}]
You need to use String#scrub or one of the encoding methods like String#encode! 您需要使用String#scrub或String#encode之类的编码方法之一! to clean up your strings first. 首先清理你的琴弦。 For example: 例如:
hash = {"first_name"=>"ezgi \xE7enberci",
"last_name"=>"\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"}
hash.each_pair { |k,v| k[v.encode! "UTF-8", "ISO-8859-2"] }
#=> {"first_name"=>"ezgi çenberci", "last_name"=>"üţçđiţţöç"}
Obviously, you may need to experiment a bit to figure out what the proper encoding is (eg ISO-8859-2, windows-1255, or something else entirely) but ensuring that you have a consistent encoding of your data set is going to be critical for you. 显然,您可能需要做一些试验以找出正确的编码是什么(例如ISO-8859-2,windows-1255或其他完全编码的东西),但是要确保对数据集进行一致的编码对您至关重要。
Character encoding detection is imperfect. 字符编码检测不完善。 Your best bet will be to try to find out what encoding your external data source is using, and use that in your string encoding rather than trying to detect it automatically. 最好的选择是尝试找出外部数据源正在使用的编码,并在您的字符串编码中使用该编码,而不是尝试自动检测它。 Otherwise, your mileage may vary. 否则,您的里程可能会有所不同。
That doesn't look like utf-8 data so this exception is normal. 看起来不像utf-8数据,所以这种异常是正常的。 Sounds like you need to tell ruby what encoding the string is actually in: 听起来您需要告诉ruby字符串的实际编码方式是:
some_string.force_encoding("windows-1254")
You can then convert to UTF8 with the encode
method. 然后,您可以使用encode
方法转换为UTF8。 There are gems (eg charlock_holmes) that have heuristics for auto detecting encodings if you're getting a mix of encodings 有些gem(例如charlock_holmes)具有启发式功能,可以在您混合使用编码时自动检测编码
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.