简体   繁体   中英

Ruby 'to_json' throws ArgumentError: invalid byte sequence in UTF-8

In rails console, I get:

hash = {"name"=>"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83"}
#=> {"name"=>"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83"}
hash.to_json
#>> ArgumentError: invalid byte sequence in UTF-8
from /home/edmodo/.rvm/gems/ruby-2.3.0@one-eye/gems/activesupport-json_encoder-1.1.0/lib/active_support/json/encoding/active_support_encoder.rb:79:in `gsub'

"\\xE1\\x83".to_json 's not working may be due to non UTF-8 characters.

Any help is appreciated.

If hash is converted to a string, then it works, but it adds garbage characters like u003E with lots of extra backslashes.

hash.to_s.to_json
#=> "\"{\\\"name\\\"=\\u003E\\\"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\\\\xE1\\\\x83\\\"}\""

That is because your input String contains invalid byte sequence in UTF-8 , as the error message precisely tells. You can check it like

hash['name'].valid_encoding?  # => false

Fundamentally, you should fix the input string, removing all the invalid byte sequence characters; in your example, it is "\\xE1\\x83"

If for some reason you need to preserve the byte sequence and encode it to a standard JSON, I think you must encode the string first, because JSON does not accept a binary data but valid UTF-8 strings only. Note a string with an invalid byte sequence is a binary data, as far as JSON is concerned.

In Rails, you can use Base64 encoding as follows:

hash['name'] = Base64.encode64 hash['name']
hash.to_json  # => a valid JSON

In decoding, you must specify the encoding, such as,

hj = hash.to_json
Base64.decode64(JSON.parse(hj)['name']).force_encoding('UTF-8') # => Decoded string

Note the reproduced string is NOT a valid UTF-8 in your case anyway. But it would help to display in Rails console.

If you are not afraid of losing content, may use this solution:

pry(main)> 
{"name"=>"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83".force_encoding("ASCII-8BIT").encode('UTF-8', undef: :replace, replace: '')}.to_json

=> "{\"name\":\"     \"}"
require 'json'

def cleanup(string)
  text = ''
  string.each_char { |char| text << char if char.valid_encoding? }
  text
end

hash = { "name" => "სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83" }
hash.transform_values! { |value| cleanup(value) }

puts hash.to_json

{"name":"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე"}

Thank you Stefan, Masa Sakano and Alexey Strizhak. Your suggestions helped me a lot. This is correct that the string has invalid byte sequence characters. What I did is just to keep valid encoding characters as below -

"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83".chars.select(&:valid_encoding?).join
=> "სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე"

This will remove the incomplete/invalid characters like "\\xE1\\x83".

Again thanks a lot everyone for helping me out to understand the problem and suggesting solutions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM