简体   繁体   English

Ruby 2.0.0 String#Match ArgumentError:UTF-8中的无效字节序列

[英]Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8

I see this a lot and haven't figured out a graceful solution. 我看到了很多,并没有想出一个优雅的解决方案。 If user input contains invalid byte sequences, I need to be able to have it not raise an exception. 如果用户输入包含无效的字节序列,我需要能够让它不引发异常。 For example: 例如:

# @raw_response comes from user and contains invalid UTF-8
# for example: @raw_response = "\xBF"  
regex.match(@raw_response)
ArgumentError: invalid byte sequence in UTF-8

Numerous similar questions have been asked and the result appears to be encoding or force encoding the string. 已经提出了许多类似的问题,结果似乎是对字符串进行编码或强制编码。 Neither of these work for me however: 然而,这些对我来说都不起作用:

regex.match(@raw_response.force_encoding("UTF-8"))
ArgumentError: invalid byte sequence in UTF-8

or 要么

regex.match(@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?"))
ArgumentError: invalid byte sequence in UTF-8

Is this a bug with Ruby 2.0.0 or am I missing something? 这是Ruby 2.0.0的错误还是我错过了什么?

What is strange is it appear to be encoding correctly, but match continues to raise an exception: 奇怪的是它似乎正确编码,但匹配继续引发异常:

@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?").encoding
 => #<Encoding:UTF-8>

In Ruby 2.0 the encode method is a no-op when encoding a string to its current encoding: 在Ruby 2.0中, encode方法在将字符串编码为其当前编码时是无操作的:

Please note that conversion from an encoding enc to the same encoding enc is a no-op, ie the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes. 请注意,从编码enc到相同编码enc是无操作,即接收器在没有任何更改的情况下返回,并且即使存在无效字节也不会引发异常。

This changed in 2.1, which also added the scrub method as an easier way to do this. 这在2.1中有所改变,它还添加了scrub方法作为一种更简单的方法。

If you are unable to upgrade to 2.1, you'll have to encode into a different encoding and back in order to remove invalid bytes, something like: 如果您无法升级到2.1,则必须编码为不同的编码并返回以删除无效字节,例如:

if ! s.valid_encoding?
  s = s.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
end

Since you're using Rails and not just Ruby you can also use tidy_bytes . 既然你使用Rails而不仅仅是Ruby,你也可以使用tidy_bytes This works with Ruby 2.0 and also will probably give you back sensible data instead of just replacement characters. 这适用于Ruby 2.0,也可能会为您提供合理的数据,而不仅仅是替换字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Ruby on Rails 问题:在“匹配?”中:UTF-8(ArgumentError)中的字节序列无效? - Ruby on Rails problem: in `match?': invalid byte sequence in UTF-8 (ArgumentError)? UTF-8中的Ruby on Rails无效字节序列(ArgumentError) - Ruby on Rails invalid byte sequence in UTF-8 (ArgumentError) Ruby&#39;to_json&#39;抛出ArgumentError:UTF-8中无效的字节序列 - Ruby 'to_json' throws ArgumentError: invalid byte sequence in UTF-8 ArgumentError(UTF-8中无效的字节序列):Ruby 1.9.3渲染视图 - ArgumentError (invalid byte sequence in UTF-8): Ruby 1.9.3 render view Ruby / Nokogiri网站抓取-UTF-8中的无效字节序列(ArgumentError) - Ruby/Nokogiri site scraping - invalid byte sequence in UTF-8 (ArgumentError) Ruby 2.1.5-ArgumentError:UTF-8中的无效字节序列 - Ruby 2.1.5 - ArgumentError: invalid byte sequence in UTF-8 ArgumentError:UTF-8 中的字节序列无效 - ArgumentError: invalid byte sequence in UTF-8 Rails中UTF-8中的无效字节序列(ArgumentError) - Rails invalid byte sequence in UTF-8 (ArgumentError) UTF-8 中的 ArgumentError 无效字节序列 - ArgumentError invalid byte sequence in UTF-8 从Android上载图片时,在Ruby服务器上,utf-8错误中的argumentserror字节序列无效 - argumenterror invalid byte sequence in utf-8 error on Ruby server when uploading images from Android
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM