简体   繁体   中英

Ruby on Rails “invalid byte sequence in UTF-8” due to bot

I have some errors triggered by a chinese bot: http://www.easou.com/search/spider.html when it scrolls my websites.

Versions of my applications are all with Ruby 1.9.3 and Rails 3.2.X

Here a stacktrace :

An ArgumentError occurred in listings#show:

  invalid byte sequence in UTF-8
  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'


-------------------------------
Request:
-------------------------------

  * URL       : http://www.my-website.com
  * IP address: X.X.X.X
  * Parameters: {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"}
  * Rails root: /.../releases/20140708150222
  * Timestamp : 2014-07-09 02:57:43 +0200

-------------------------------
Backtrace:
-------------------------------

  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'
  rack (1.4.5) lib/rack/utils.rb:96:in `block in parse_nested_query'
  rack (1.4.5) lib/rack/utils.rb:93:in `each'
  rack (1.4.5) lib/rack/utils.rb:93:in `parse_nested_query'
  rack (1.4.5) lib/rack/request.rb:332:in `parse_query'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:275:in `parse_query'
  rack (1.4.5) lib/rack/request.rb:209:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:237:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/parameters.rb:10:in `parameters'

-------------------------------
Session:
-------------------------------

  * session id: nil
  * data: {}

-------------------------------
Environment:
-------------------------------

  * CONTENT_LENGTH                                 : 514
  * CONTENT_TYPE                                   : application/x-www-form-urlencoded
  * HTTP_ACCEPT                                    : text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
  * HTTP_ACCEPT_ENCODING                           : gzip, deflate
  * HTTP_ACCEPT_LANGUAGE                           : zh;q=0.9,en;q=0.8
  * HTTP_CONNECTION                                : close
  * HTTP_HOST                                      : www.my-website.com
  * HTTP_REFER                                     : http://www.my-website.com/
  * HTTP_USER_AGENT                                : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
  * ORIGINAL_FULLPATH                              : /
  * PASSENGER_APP_SPAWNER_IDLE_TIME                : -1
  * PASSENGER_APP_TYPE                             : rack
  * PASSENGER_CONNECT_PASSWORD                     : [FILTERED]
  * PASSENGER_DEBUGGER                             : false
  * PASSENGER_ENVIRONMENT                          : production
  * PASSENGER_FRAMEWORK_SPAWNER_IDLE_TIME          : -1
  * PASSENGER_FRIENDLY_ERROR_PAGES                 : true
  * PASSENGER_GROUP                                :
  * PASSENGER_MAX_REQUESTS                         : 0
  * PASSENGER_MIN_INSTANCES                        : 1
  * PASSENGER_SHOW_VERSION_IN_HEADER               : true
  * PASSENGER_SPAWN_METHOD                         : smart-lv2
  * PASSENGER_USER                                 :
  * PASSENGER_USE_GLOBAL_QUEUE                     : true
  * PATH_INFO                                      : /
  * QUERY_STRING                                   :
  * REMOTE_ADDR                                    : 183.60.212.153
  * REMOTE_PORT                                    : 52997
  * REQUEST_METHOD                                 : GET
  * REQUEST_URI                                    : /
  * SCGI                                           : 1
  * SCRIPT_NAME                                    :
  * SERVER_PORT                                    : 80
  * SERVER_PROTOCOL                                : HTTP/1.1
  * SERVER_SOFTWARE                                : nginx/1.2.6
  * UNION_STATION_SUPPORT                          : false
  * _                                              : _
  * action_controller.instance                     : listings#show
  * action_dispatch.backtrace_cleaner              : #<Rails::BacktraceCleaner:0x000000056e8660>
  * action_dispatch.cookies                        : #<ActionDispatch::Cookies::CookieJar:0x00000006564e28>
  * action_dispatch.logger                         : #<ActiveSupport::TaggedLogging:0x0000000318aff8>
  * action_dispatch.parameter_filter               : [:password, /RAW_POST_DATA/, /RAW_POST_DATA/, /RAW_POST_DATA/]
  * action_dispatch.remote_ip                      : 183.60.212.153
  * action_dispatch.request.content_type           : application/x-www-form-urlencoded
  * action_dispatch.request.parameters             : {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"}
  * action_dispatch.request.path_parameters        : {:action=>"show", :controller=>"listings", :id=>"location-t7-villeurbanne--58"}
  * action_dispatch.request.query_parameters       : {}
  * action_dispatch.request.request_parameters     : {}
  * action_dispatch.request.unsigned_session_cookie: {}
  * action_dispatch.request_id                     : 9f8afbc8ff142f91ddbd9cabee3629f3
  * action_dispatch.routes                         : #<ActionDispatch::Routing::RouteSet:0x0000000339f370>
  * action_dispatch.show_detailed_exceptions       : false
  * action_dispatch.show_exceptions                : true
  * rack-cache.allow_reload                        : false
  * rack-cache.allow_revalidate                    : false
  * rack-cache.cache_key                           : Rack::Cache::Key
  * rack-cache.default_ttl                         : 0
  * rack-cache.entitystore                         : rails:/
  * rack-cache.ignore_headers                      : ["Set-Cookie"]
  * rack-cache.metastore                           : rails:/
  * rack-cache.private_headers                     : ["Authorization", "Cookie"]
  * rack-cache.storage                             : #<Rack::Cache::Storage:0x000000039c5768>
  * rack-cache.use_native_ttl                      : false
  * rack-cache.verbose                             : false
  * rack.errors                                    : #<IO:0x000000006592a8>
  * rack.input                                     : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.multiprocess                              : true
  * rack.multithread                               : false
  * rack.request.cookie_hash                       : {}
  * rack.request.form_hash                         :
  * rack.request.form_input                        : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.request.form_vars                         : ���W�"��陷q�B��)���
�F��P   Z� 8�� &   G\y�P��u�T ed �.�%�mxEAẳ\�d*�Hg�     �C賳�lj��� � U 1��]pgt�P�
  Ɗ    ��c"� ��LX��D���HR�y��p`6�l���lN�P �l�S����`V4y��c����X2�        &JO!��*p �l��-�гU��w }g�ԍk�� (� F J��  q�:�5G�Jh�pί����ࡃ]                                                                                                                                                                                                                                                                           �z�h���� d }�}
  * rack.request.query_hash                        : {}
  * rack.request.query_string                      :
  * rack.run_once                                  : false
  * rack.session                                   : {}
  * rack.session.options                           : {:path=>"/", :domain=>nil, :expire_after=>nil, :secure=>false, :httponly=>true, :defer=>false, :renew=>false, :coder=>#<Rack::Session::Cookie::Base64::Marshal:0x000000034d4ad8>, :id=>nil}
  * rack.url_scheme                                : http
  * rack.version                                   : [1, 0]

As you can see there is no invalid utf-8 in the url but only in the rack.request.form_vars . I have about hundred errors per days, and all similar as this one.

So, I tried to force utf-8 in rack.request.form_vars with something like this:

class RackFormVarsSanitizer
  def initialize(app)
    @app = app
  end

  def call(env)
    if env["rack.request.form_vars"] 
      env["rack.request.form_vars"] = env["rack.request.form_vars"].force_encoding('UTF-8')
    end
    @app.call(env)
  end
end

And I call it in my application.rb :

config.middleware.use "RackFormVarsSanitizer"

It doesn't seem to work because I already have errors. The problem is I can't test in development mode because I don't know how to set rack.request.form_vars .

I installed utf8-cleaner gem but it fixes nothing.

Somebody have an idea to fix this? or to trigger it in development?

So you don't have to piece together the comments in my other reply, this is what I'm doing now – I've seen no errors for 24 hours, so it looks very promising:

Add rack-utf8_sanitizer to your Gemfile:

gem 'rack-utf8_sanitizer'

and run

bundle

Put this middleware in app/middleware/handle_invalid_percent_encoding.rb and rename the class HandleInvalidPercentEncoding (because ExceptionApp is a bit too general).

In the config block of config/application.rb do:

require "#{Rails.root}/app/middleware/handle_invalid_percent_encoding.rb"


# NOTE: These must be in this order relative to each other.
# HandleInvalidPercentEncoding just raises for encoding errors it doesn't cover,
# so it must run after (= be inserted before) Rack::UTF8Sanitizer.
config.middleware.insert 0, HandleInvalidPercentEncoding
config.middleware.insert 0, Rack::UTF8Sanitizer  # from a gem

Deploy. Done.

( app happens to be the location for middleware in the project I'm working on, but I'd probably prefer lib . Whatever. Either should work.)

Add this line to your Gemfile , then run bundle in your terminal:

gem "handle_invalid_percent_encoding_requests"

This solution is based on Henrik's answer , turned into a Rails Engine gem .

There is an issue in the gem repo with a link to someone's possible solution – they say it works for them but they're not sure if it's a good solution.

I've yet to try it, but I think I will.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM