Ruby'to_json'抛出ArgumentError：UTF-8中无效的字节序列

Question

In rails console, I get: 在Rails控制台中，我得到：

hash = {"name"=>"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83"}
#=> {"name"=>"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83"}
hash.to_json
#>> ArgumentError: invalid byte sequence in UTF-8
from /home/edmodo/.rvm/gems/ruby-2.3.0@one-eye/gems/activesupport-json_encoder-1.1.0/lib/active_support/json/encoding/active_support_encoder.rb:79:in `gsub'

"\\xE1\\x83".to_json 's not working may be due to non UTF-8 characters. "\\xE1\\x83".to_json无效，可能是由于非UTF-8字符所致。

Any help is appreciated. 任何帮助表示赞赏。

If hash is converted to a string, then it works, but it adds garbage characters like u003E with lots of extra backslashes. 如果将hash转换为字符串，则可以使用它，但是会添加诸如u003E类的垃圾字符， u003E带有许多额外的反斜杠。

hash.to_s.to_json
#=> "\"{\\\"name\\\"=\\u003E\\\"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\\\\xE1\\\\x83\\\"}\""

Answer 1

That is because your input String contains invalid byte sequence in UTF-8 , as the error message precisely tells. 这是因为您的输入String 在UTF-8中包含无效的字节序列 ，正如错误消息确切说明的那样。 You can check it like 你可以像检查

hash['name'].valid_encoding?  # => false

Fundamentally, you should fix the input string, removing all the invalid byte sequence characters; 从根本上讲，您应该修复输入字符串，删除所有无效的字节序列字符； in your example, it is "\\xE1\\x83" 在您的示例中，它是"\\xE1\\x83"

If for some reason you need to preserve the byte sequence and encode it to a standard JSON, I think you must encode the string first, because JSON does not accept a binary data but valid UTF-8 strings only. 如果出于某种原因需要保留字节序列并将其编码为标准JSON，我认为您必须首先对字符串进行编码，因为JSON不接受二进制数据，而仅接受有效的UTF-8字符串。 Note a string with an invalid byte sequence is a binary data, as far as JSON is concerned. 请注意，就JSON而言，具有无效字节序列的字符串是二进制数据。

In Rails, you can use Base64 encoding as follows: 在Rails中，可以使用Base64编码，如下所示：

hash['name'] = Base64.encode64 hash['name']
hash.to_json  # => a valid JSON

In decoding, you must specify the encoding, such as, 在解码时，您必须指定编码，例如，

hj = hash.to_json
Base64.decode64(JSON.parse(hj)['name']).force_encoding('UTF-8') # => Decoded string

Note the reproduced string is NOT a valid UTF-8 in your case anyway. 请注意，无论如何，您所复制的字符串都不是有效的UTF-8。 But it would help to display in Rails console. 但这将有助于在Rails控制台中显示。

Answer 2

If you are not afraid of losing content, may use this solution: 如果您不怕丢失内容，可以使用以下解决方案：

pry(main)> 
{"name"=>"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83".force_encoding("ASCII-8BIT").encode('UTF-8', undef: :replace, replace: '')}.to_json

=> "{\"name\":\"     \"}"

Answer 3

require 'json'

def cleanup(string)
  text = ''
  string.each_char { |char| text << char if char.valid_encoding? }
  text
end

hash = { "name" => "სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83" }
hash.transform_values! { |value| cleanup(value) }

puts hash.to_json

{"name":"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე"}

Answer 4

Thank you Stefan, Masa Sakano and Alexey Strizhak. 谢谢Stefan，Masa Sakano和Alexey Strizhak。 Your suggestions helped me a lot. 您的建议对我有很大帮助。 This is correct that the string has invalid byte sequence characters. 字符串具有无效的字节序列字符是正确的。 What I did is just to keep valid encoding characters as below - 我所做的只是保持如下所示的有效编码字符-

"სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე\xE1\x83".chars.select(&:valid_encoding?).join
=> "სსიპ ოთარ ჩხეიძის სახელობის სოფელ ყე"

This will remove the incomplete/invalid characters like "\\xE1\\x83". 这将删除不完整/无效的字符，例如“ \\ xE1 \\ x83”。

Again thanks a lot everyone for helping me out to understand the problem and suggesting solutions. 再次非常感谢大家帮助我了解问题并提出解决方案。

Ruby'to_json'抛出ArgumentError：UTF-8中无效的字节序列

问题描述

4 个解决方案

解决方案1
2 2018-10-30 14:04:00

解决方案2
0 2018-10-30 19:04:02

解决方案3
0 2018-10-31 16:55:13

解决方案4
0 2018-11-01 09:05:41

Ruby&#39;to_json&#39;抛出ArgumentError：UTF-8中无效的字节序列

问题描述

4 个解决方案

解决方案1 2 2018-10-30 14:04:00

解决方案2 0 2018-10-30 19:04:02

解决方案3 0 2018-10-31 16:55:13

解决方案4 0 2018-11-01 09:05:41

Ruby'to_json'抛出ArgumentError：UTF-8中无效的字节序列

解决方案1
2 2018-10-30 14:04:00

解决方案2
0 2018-10-30 19:04:02

解决方案3
0 2018-10-31 16:55:13

解决方案4
0 2018-11-01 09:05:41