简体   繁体   中英

Rails / Postgres Lookup Performance

I have a status dashboard that shows the status of remote hardware devices that 'ping' the application every minute and log their status.

class Sensor < ActiveRecord::Base
  has_many :logs

  def most_recent_log
    logs.order("id DESC").first
  end
end

class Log < ActiveRecord::Base
  belongs_to :sensor
end 

Given I'm only interested in showing the current status, the dashboard only shows the most recent log for all sensors. This application has been running for a long time now and there are tens of millions of Log records.

The problem I have is that the dashboard takes around 8 seconds to load. From what I can tell, this is largely because there is an N+1 Query fetching these logs.

Completed 200 OK in 4729.5ms (Views: 4246.3ms | ActiveRecord: 480.5ms)

I do have the following index in place:

add_index "logs", ["sensor_id", "id"], :name => "index_logs_on_sensor_id_and_id", :order => {"id"=>:desc}

My controller / lookup code is the following:

class SensorsController < ApplicationController
  def index
    @sensors = Sensor.all
  end
end

  1. How do I make the load time reasonable?
  2. Is there a way to avoid the N+1 and reload this?

I had thought of putting a latest_log_id reference on to Sensor and then updating this every time a new log for that sensor is posted - but something in my head is telling me that other developers would say this is a bad thing. Is this the case?

How are problems like this usually solved?

There are 2 relatively easy ways to do this:

  • Use ActiveRecord eager loading to pull in just the most recent logs
  • Roll your own mini eager loading system (as a Hash) for just this purpose

Basic ActiveRecord approach:

subquery = Log.group(:sensor_id).select("MAX('id')")
@sensors = Sensor.eager_load(:logs).where(logs: {id: subquery}).all

Note that you should NOT use your most_recent_log method for each sensor (that will trigger an N+1), but rather logs.first . Only the latest log for each sensor will actually be prefetched in the logs collection.

Rolling your own may be more efficient from a SQL perspective, but more complex to read and use:

@sensors = Sensor.all
logs = Log.where(id: Log.group(:sensor_id).select("MAX('id')"))
@sensor_logs = logs.each_with_object({}){|log, hash|
  hash[log.sensor_id] = log
}

@sensor_logs is a Hash, permitting a fast lookup for the latest log by sensor.id .

Regarding your comment about storing the latest log id - you are essentially asking if you should build a cache. The answer would be 'it depends'. There are many advantages and many disadvantages to caching, so it comes down to 'is the benefit worth the cost'. From what you are describing, it doesn't appear that you are familiar with the difficulties they introduce (Google 'cache invalidation') or if they are applicable in your case. I'd recommend against it until you can demonstrate that a) it is adding real value over a non-cache solution, and b) it can be safely applied for your scenario.

There's 3 options:

  1. eager loading
  2. joining
  3. caching the current status

--

  1. is explained by PinnyM

  2. You can do a join from the Sensor just to the latest Log record for each row, so everything gets fetched in the one query. Not sure off hand how that'll perform with the number of rows you have, likely it'll still be slower than you want.

  3. The thing you mentioned - caching the latest_log_id (or even caching just the latest_status if that's all you need for the dashboard) is actually OK. It's called denormalization and it's a useful thing if used carefully. You've likely come across "counter cache" plugins for rails which are in the same vein - duplicating data, in the interests of being able to optimise read performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM