简体   繁体   English

Ruby 2.1.5-ArgumentError:UTF-8中的无效字节序列

[英]Ruby 2.1.5 - ArgumentError: invalid byte sequence in UTF-8

I'm having trouble with UTF8 chars in Ruby 2.1.5 and Rails 4. 我在Ruby 2.1.5和Rails 4中遇到UTF8字符的麻烦。

The problem is, the data which come from an external service are like that: 问题是,来自外部服务的数据如下:

"first_name"=>"ezgi \xE7enberci"
"last_name" => "\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"

These characters mostly include Turkish alphabet characters like "üğşiçö". 这些字符主要包括土耳其字母,例如“üğşiçö”。 When the application tries to save these data, the errors below occur: 当应用程序尝试保存这些数据时,发生以下错误:

ArgumentError: invalid byte sequence in UTF-8
Mysql2::Error: Incorrect string value

How can I fix this? 我怎样才能解决这个问题?

What's Wrong 怎么了

Ruby thinks you have invalid byte sequences because your strings aren't UTF-8. Ruby认为您的字节序列无效,因为您的字符串不是UTF-8。 For example, using the rchardet gem : 例如,使用rchardet gem

require 'chardet'
["ezgi \xE7enberci", "\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"].map do str
  puts CharDet.detect str
end

#=> [{"encoding"=>"ISO-8859-2", "confidence"=>0.8600826867857209}, {"encoding"=>"windows-1255", "confidence"=>0.5807177322740268}] #=> [{“ encoding” =>“ ISO-8859-2”,“ confidence” => 0.8600826867857209},{“ encoding” =>“ windows-1255”,“ confidence” => 0.5807177322740268}]

How to Fix It 如何修复

You need to use String#scrub or one of the encoding methods like String#encode! 您需要使用String#scrubString#encode之类的编码方法之一 to clean up your strings first. 首先清理你的琴弦。 For example: 例如:

hash = {"first_name"=>"ezgi \xE7enberci",
        "last_name"=>"\xFC\xFE\xE7\xF0i\xFE\xFE\xF6\xE7"}
hash.each_pair { |k,v| k[v.encode! "UTF-8", "ISO-8859-2"] }
#=> {"first_name"=>"ezgi çenberci", "last_name"=>"üţçđiţţöç"}

Obviously, you may need to experiment a bit to figure out what the proper encoding is (eg ISO-8859-2, windows-1255, or something else entirely) but ensuring that you have a consistent encoding of your data set is going to be critical for you. 显然,您可能需要做一些试验以找出正确的编码是什么(例如ISO-8859-2,windows-1255或其他完全编码的东西),但是要确保对数据集进行一致的编码对您至关重要。

Character encoding detection is imperfect. 字符编码检测不完善。 Your best bet will be to try to find out what encoding your external data source is using, and use that in your string encoding rather than trying to detect it automatically. 最好的选择是尝试找出外部数据源正在使用的编码,并在您的字符串编码中使用该编码,而不是尝试自动检测它。 Otherwise, your mileage may vary. 否则,您的里程可能会有所不同。

That doesn't look like utf-8 data so this exception is normal. 看起来不像utf-8数据,所以这种异常是正常的。 Sounds like you need to tell ruby what encoding the string is actually in: 听起来您需要告诉ruby字符串的实际编码方式是:

some_string.force_encoding("windows-1254")

You can then convert to UTF8 with the encode method. 然后,您可以使用encode方法转换为UTF8。 There are gems (eg charlock_holmes) that have heuristics for auto detecting encodings if you're getting a mix of encodings 有些gem(例如charlock_holmes)具有启发式功能,可以在您混合使用编码时自动检测编码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 UTF-8中的Ruby on Rails无效字节序列(ArgumentError) - Ruby on Rails invalid byte sequence in UTF-8 (ArgumentError) Ruby'to_json'抛出ArgumentError:UTF-8中无效的字节序列 - Ruby 'to_json' throws ArgumentError: invalid byte sequence in UTF-8 Ruby / Nokogiri网站抓取-UTF-8中的无效字节序列(ArgumentError) - Ruby/Nokogiri site scraping - invalid byte sequence in UTF-8 (ArgumentError) Ruby on Rails 问题:在“匹配?”中:UTF-8(ArgumentError)中的字节序列无效? - Ruby on Rails problem: in `match?': invalid byte sequence in UTF-8 (ArgumentError)? ArgumentError(UTF-8中无效的字节序列):Ruby 1.9.3渲染视图 - ArgumentError (invalid byte sequence in UTF-8): Ruby 1.9.3 render view ArgumentError:UTF-8 中的字节序列无效 - ArgumentError: invalid byte sequence in UTF-8 Rails中UTF-8中的无效字节序列(ArgumentError) - Rails invalid byte sequence in UTF-8 (ArgumentError) UTF-8 中的 ArgumentError 无效字节序列 - ArgumentError invalid byte sequence in UTF-8 Ruby 2.0.0 String#Match ArgumentError:UTF-8中的无效字节序列 - Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8 从Android上载图片时,在Ruby服务器上,utf-8错误中的argumentserror字节序列无效 - argumenterror invalid byte sequence in utf-8 error on Ruby server when uploading images from Android
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM