简体   繁体   English

由于机器人,Ruby on Rails“UTF-8 中的字节序列无效”

[英]Ruby on Rails “invalid byte sequence in UTF-8” due to bot

I have some errors triggered by a chinese bot: http://www.easou.com/search/spider.html when it scrolls my websites.我有一些由中国机器人触发的错误: http : //www.easou.com/search/spider.html当它滚动我的网站时。

Versions of my applications are all with Ruby 1.9.3 and Rails 3.2.X我的应用程序版本都是 Ruby 1.9.3 和 Rails 3.2.X

Here a stacktrace :这里有一个堆栈跟踪:

An ArgumentError occurred in listings#show:

  invalid byte sequence in UTF-8
  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'


-------------------------------
Request:
-------------------------------

  * URL       : http://www.my-website.com
  * IP address: X.X.X.X
  * Parameters: {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"}
  * Rails root: /.../releases/20140708150222
  * Timestamp : 2014-07-09 02:57:43 +0200

-------------------------------
Backtrace:
-------------------------------

  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'
  rack (1.4.5) lib/rack/utils.rb:96:in `block in parse_nested_query'
  rack (1.4.5) lib/rack/utils.rb:93:in `each'
  rack (1.4.5) lib/rack/utils.rb:93:in `parse_nested_query'
  rack (1.4.5) lib/rack/request.rb:332:in `parse_query'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:275:in `parse_query'
  rack (1.4.5) lib/rack/request.rb:209:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:237:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/parameters.rb:10:in `parameters'

-------------------------------
Session:
-------------------------------

  * session id: nil
  * data: {}

-------------------------------
Environment:
-------------------------------

  * CONTENT_LENGTH                                 : 514
  * CONTENT_TYPE                                   : application/x-www-form-urlencoded
  * HTTP_ACCEPT                                    : text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
  * HTTP_ACCEPT_ENCODING                           : gzip, deflate
  * HTTP_ACCEPT_LANGUAGE                           : zh;q=0.9,en;q=0.8
  * HTTP_CONNECTION                                : close
  * HTTP_HOST                                      : www.my-website.com
  * HTTP_REFER                                     : http://www.my-website.com/
  * HTTP_USER_AGENT                                : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
  * ORIGINAL_FULLPATH                              : /
  * PASSENGER_APP_SPAWNER_IDLE_TIME                : -1
  * PASSENGER_APP_TYPE                             : rack
  * PASSENGER_CONNECT_PASSWORD                     : [FILTERED]
  * PASSENGER_DEBUGGER                             : false
  * PASSENGER_ENVIRONMENT                          : production
  * PASSENGER_FRAMEWORK_SPAWNER_IDLE_TIME          : -1
  * PASSENGER_FRIENDLY_ERROR_PAGES                 : true
  * PASSENGER_GROUP                                :
  * PASSENGER_MAX_REQUESTS                         : 0
  * PASSENGER_MIN_INSTANCES                        : 1
  * PASSENGER_SHOW_VERSION_IN_HEADER               : true
  * PASSENGER_SPAWN_METHOD                         : smart-lv2
  * PASSENGER_USER                                 :
  * PASSENGER_USE_GLOBAL_QUEUE                     : true
  * PATH_INFO                                      : /
  * QUERY_STRING                                   :
  * REMOTE_ADDR                                    : 183.60.212.153
  * REMOTE_PORT                                    : 52997
  * REQUEST_METHOD                                 : GET
  * REQUEST_URI                                    : /
  * SCGI                                           : 1
  * SCRIPT_NAME                                    :
  * SERVER_PORT                                    : 80
  * SERVER_PROTOCOL                                : HTTP/1.1
  * SERVER_SOFTWARE                                : nginx/1.2.6
  * UNION_STATION_SUPPORT                          : false
  * _                                              : _
  * action_controller.instance                     : listings#show
  * action_dispatch.backtrace_cleaner              : #<Rails::BacktraceCleaner:0x000000056e8660>
  * action_dispatch.cookies                        : #<ActionDispatch::Cookies::CookieJar:0x00000006564e28>
  * action_dispatch.logger                         : #<ActiveSupport::TaggedLogging:0x0000000318aff8>
  * action_dispatch.parameter_filter               : [:password, /RAW_POST_DATA/, /RAW_POST_DATA/, /RAW_POST_DATA/]
  * action_dispatch.remote_ip                      : 183.60.212.153
  * action_dispatch.request.content_type           : application/x-www-form-urlencoded
  * action_dispatch.request.parameters             : {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"}
  * action_dispatch.request.path_parameters        : {:action=>"show", :controller=>"listings", :id=>"location-t7-villeurbanne--58"}
  * action_dispatch.request.query_parameters       : {}
  * action_dispatch.request.request_parameters     : {}
  * action_dispatch.request.unsigned_session_cookie: {}
  * action_dispatch.request_id                     : 9f8afbc8ff142f91ddbd9cabee3629f3
  * action_dispatch.routes                         : #<ActionDispatch::Routing::RouteSet:0x0000000339f370>
  * action_dispatch.show_detailed_exceptions       : false
  * action_dispatch.show_exceptions                : true
  * rack-cache.allow_reload                        : false
  * rack-cache.allow_revalidate                    : false
  * rack-cache.cache_key                           : Rack::Cache::Key
  * rack-cache.default_ttl                         : 0
  * rack-cache.entitystore                         : rails:/
  * rack-cache.ignore_headers                      : ["Set-Cookie"]
  * rack-cache.metastore                           : rails:/
  * rack-cache.private_headers                     : ["Authorization", "Cookie"]
  * rack-cache.storage                             : #<Rack::Cache::Storage:0x000000039c5768>
  * rack-cache.use_native_ttl                      : false
  * rack-cache.verbose                             : false
  * rack.errors                                    : #<IO:0x000000006592a8>
  * rack.input                                     : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.multiprocess                              : true
  * rack.multithread                               : false
  * rack.request.cookie_hash                       : {}
  * rack.request.form_hash                         :
  * rack.request.form_input                        : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.request.form_vars                         : ���W�"��陷q�B��)���
�F��P   Z� 8�� &   G\y�P��u�T ed �.�%�mxEAẳ\�d*�Hg�     �C賳�lj��� � U 1��]pgt�P�
  Ɗ    ��c"� ��LX��D���HR�y��p`6�l���lN�P �l�S����`V4y��c����X2�        &JO!��*p �l��-�гU��w }g�ԍk�� (� F J��  q�:�5G�Jh�pί����ࡃ]                                                                                                                                                                                                                                                                           �z�h���� d }�}
  * rack.request.query_hash                        : {}
  * rack.request.query_string                      :
  * rack.run_once                                  : false
  * rack.session                                   : {}
  * rack.session.options                           : {:path=>"/", :domain=>nil, :expire_after=>nil, :secure=>false, :httponly=>true, :defer=>false, :renew=>false, :coder=>#<Rack::Session::Cookie::Base64::Marshal:0x000000034d4ad8>, :id=>nil}
  * rack.url_scheme                                : http
  * rack.version                                   : [1, 0]

As you can see there is no invalid utf-8 in the url but only in the rack.request.form_vars .如您所见,url 中没有无效的 utf-8,而只有rack.request.form_vars I have about hundred errors per days, and all similar as this one.我每天大约有一百个错误,并且都与这个相似。

So, I tried to force utf-8 in rack.request.form_vars with something like this:所以,我试图在rack.request.form_vars强制使用 utf-8 ,如下所示:

class RackFormVarsSanitizer
  def initialize(app)
    @app = app
  end

  def call(env)
    if env["rack.request.form_vars"] 
      env["rack.request.form_vars"] = env["rack.request.form_vars"].force_encoding('UTF-8')
    end
    @app.call(env)
  end
end

And I call it in my application.rb :我在我的application.rb调用它:

config.middleware.use "RackFormVarsSanitizer"

It doesn't seem to work because I already have errors.它似乎不起作用,因为我已经有错误。 The problem is I can't test in development mode because I don't know how to set rack.request.form_vars .问题是我无法在开发模式下进行测试,因为我不知道如何设置rack.request.form_vars

I installed utf8-cleaner gem but it fixes nothing.我安装了utf8-cleaner gem,但它没有修复任何问题。

Somebody have an idea to fix this?有人有解决这个问题的想法吗? or to trigger it in development?还是在开发中触发它?

So you don't have to piece together the comments in my other reply, this is what I'm doing now – I've seen no errors for 24 hours, so it looks very promising:所以你不必把我另一个回复中的评论拼凑起来,这就是我现在正在做的——我已经 24 小时没有看到错误,所以它看起来很有希望:

Add rack-utf8_sanitizer to your Gemfile:rack-utf8_sanitizer添加到您的 Gemfile 中:

gem 'rack-utf8_sanitizer'

and run并运行

bundle

Put this middleware in app/middleware/handle_invalid_percent_encoding.rb and rename the class HandleInvalidPercentEncoding (because ExceptionApp is a bit too general).这个中间件放在app/middleware/handle_invalid_percent_encoding.rb并重命名类HandleInvalidPercentEncoding (因为ExceptionApp有点太笼统了)。

In the config block of config/application.rb do:config/application.rbconfig块中执行:

require "#{Rails.root}/app/middleware/handle_invalid_percent_encoding.rb"


# NOTE: These must be in this order relative to each other.
# HandleInvalidPercentEncoding just raises for encoding errors it doesn't cover,
# so it must run after (= be inserted before) Rack::UTF8Sanitizer.
config.middleware.insert 0, HandleInvalidPercentEncoding
config.middleware.insert 0, Rack::UTF8Sanitizer  # from a gem

Deploy.部署。 Done.完毕。

( app happens to be the location for middleware in the project I'm working on, but I'd probably prefer lib . Whatever. Either should work.) app恰好是我正在处理的项目中中间件的位置,但我可能更喜欢lib 。无论如何。两者都应该可行。)

Add this line to your Gemfile , then run bundle in your terminal:将此行添加到Gemfile ,然后在终端中运行bundle

gem "handle_invalid_percent_encoding_requests"

This solution is based on Henrik's answer , turned into a Rails Engine gem .这个解决方案基于Henrik 的回答,变成了 Rails Engine gem

There is an issue in the gem repo with a link to someone's possible solution – they say it works for them but they're not sure if it's a good solution. gem repo 中存在一个问题,其中包含指向某人可能的解决方案的链接——他们说这对他们有用,但他们不确定这是否是一个好的解决方案。

I've yet to try it, but I think I will.我还没有尝试过,但我想我会的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM