Data loss after deadlock - SQL Server 2008, Ruby on Rails, Phusion Passenger, Linux, FreeTDS
I am confronted with a mystery problem that caused a data loss in a Ruby on Rails intranet application that I am responsible for. Apologies if this is not strictly speaking about a programming problem - well at least I maintain the application's Ruby code. The problem has so far occurred three times in two years.
The environment:
Symptoms:
It looks to me as if something like this happened:
Would that make sense? I'm wondering if anybody had similar experiences or hints where I could look further into. I suspected a bug in Passenger which may have forked the file descriptor of the database connection, but I cannot reproduce it. Passenger seems to properly create new DB connections on every fork.
I am considering changing the database's isolation model to "read committed snapshot" to reduce the number of deadlocks, but I'm aware that this doesn't fix the root cause and that this might cause other problems to me.
I have now been able to track down the problem myself and would like to share the solution with anyone who might be facing similar problems, now or in future.
The following was happening:
The resulted in an ActiveRecord::StatementInvalid
exception like this:
A ActiveRecord::StatementInvalid occurred in (...): ODBC::Error: 37000 (1205) [FreeTDS][SQL Server]Transaction (Process ID 55) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.: UPDATE [(....tablename....)] SET [position] = 1 WHERE [id] = 795419 /usr/lib/ruby/gems/1.8/gems/activerecord-2.3.8/lib/active_record/connection_adapters/abstract_adapter.rb:221:in `log'
In ActiveRecord::ConnectionAdapaters::DatabaseStatements.transaction()
, the exception is handled the following way:
rescue Exception => database_transaction_rollback if transaction_open && !outside_transaction? transaction_open = false decrement_open_transactions if open_transactions == 0 rollback_db_transaction else rollback_to_savepoint end end raise unless database_transaction_rollback.is_a?(ActiveRecord::Rollback) end
transaction_open
is true
at this point. The method outside_transaction?
is implemented this way in the SQL Server adapter:
def outside_transaction? info_schema_query { select_value("SELECT @@TRANCOUNT") == 0 } end
@@TRANCOUNT
is 0 at this point because SQL Server has already rolled back the transaction - as shown in my debug output:
SQL (1.0ms) SELECT @@TRANCOUNT => TRANCOUNT=0
Therefore outside_transaction?
returns true
, and the exception handling code shown above does not perform a rollback. So far, so good.
The exception is re-thrown by the code shown above, and caught by my ApplicationController
because of:
class ApplicationController < ActionController::Base rescue_from Exception, :with => :render_error
The render_error
method stores the exception message in a flash variable:
flash[:exception_message] = exception.message
flash variables are stoerd in the session. And since I'm using the active_record_store
, the session data will be stored in the database table sessions
. (I'm actually using smart_session_store
, but this makes no difference in this respect). So another transaction is starting...
EXECUTE (1.2ms) BEGIN TRANSACTION SQL (1.1ms) SELECT session_id, data,id FROM sessions WHERE id=150091 EXECUTE (1.3ms) UPDATE sessions SET updated_at=CURRENT_TIMESTAMP, data='BAh7FDoWdW9faGlk(........)' WHERE id=150091 CACHE (0.0ms) SELECT @@TRANCOUNT => TRANCOUNT=0
Although a transaction has started, SELECT @@TRANCOUNT
returns 0 - the value is taken from the cache! This is where the disaster is taking its course.
The transaction
method now believes that the transaction is not active anymore and therefore does not do a COMMIT:
if outside_transaction? # (this does the SELECT @@TRANCOUNT) @open_transactions = 0 # Getting here! elsif transaction_open decrement_open_transactions begin if open_transactions == 0 commit_db_transaction # NOT getting here! else release_savepoint end rescue Exception => database_transaction_rollback if open_transactions == 0 rollback_db_transaction else rollback_to_savepoint end raise end end
end
Since the database connection is kept open (this seems to be done only in production mode, not in development), subsequent Rails actions processed by the same worker process all append to the transaction that was left open here. To the users it still looks like data is being processed normally since they see everything through the eyes of the transaction. Only this one worker process remains active - any other started worker processes are hanging since the open transaction locks all kinds of database resources. When I restarted the web server, the one responsive worker process was stopped and its transaction was rolled back. This is where the data loss became visible in the application.
I had a brief look at newer (Rails 3.x) versions of the code shown above and it looks to me like the problem could no longer occur with them. The transaction
method no longer seems to call outside_transaction?
but relies on the internally maintained transaction state. However, since I cannot upgrade at this time, I will fix the problem locally, probably by changing the transaction
method in a similar way it was done for Rails 3.x.
First, I will admit that I have not read thru this issue thoroughly, but I have gut feeling and a suggestion. I have seen issues like this and they typically happen in large legacy applications with many models, some of which have a connection pool to another database. In said scenario it is easy to reference a model that uses another connection during ActiveRecord's callback phase. This leads to cross database transactions which is not supported by ActiveRecord out of the box.
If you think this may be the cause review your code for the following. Make sure that you never use #establish_connection and if you do, only use it once in a base or connection class that others inherit from. I have a github gist here ( https://gist.github.com/metaskills/4065702 ) that shows what I am talking about. Second, make sure that your models are not talking to another model in another connection pool during save/transactions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.