Rails, Heroku, and Resque: Long Running Background Job Optimization

Question

We're building a tinder style app that allows users to "like" or "dislike" events. Each event has about 100 keywords associated with it. When a user "likes" or "dislikes" and event, we associate that event's keywords with the user. Users can quickly get thousands of keywords.

We use through tables to associate users and events to keywords (event_keywords and user_keywords). The through table has an additional column relevance_score which is a float (eg a keyword can be 0.1 if it's very slightly relevant or 0.9 if it's very relevant).

Our goal is to show users the most relevant events, based on their keywords. So Events has many event_rankings which belong to a user. Theoretically we want to rank all the events differently for each user.

Here are the models:

User.rb:

  has_many :user_keywords, :dependent => :destroy
  has_many :keywords, :through => :user_keywords
  has_many :event_rankings, :dependent => :destroy
  has_many :events, :through => :event_rankings

Event.rb

  has_many :event_keywords, :dependent => :destroy
  has_many :keywords, :through => :event_keywords
  has_many :event_rankings, :dependent => :destroy
  has_many :users, :through => :event_rankings

UserKeyword.rb:

  belongs_to :user
  belongs_to :keyword

EventKeyword.rb:

  belongs_to :keyword
  belongs_to :event

EventRanking.rb:

  belongs_to :user
  belongs_to :event

Keyword.rb:

  has_many :event_keywords, :dependent => :destroy
  has_many :events, :through => :event_keywords
  has_many :user_keywords, :dependent => :destroy
  has_many :users, :through => :user_keywords

We have a method that calculates how relevant an event is to that specific user based on their keywords. This method runs really quick since it's just math.

User.rb:

def calculate_event_relevance(event_id)
  ## Step 1: Find which of the event keywords the user has 
  ## Step 2: Compare those keywords and do math to calculate a score 
  ## Step 3: Update the event_ranking for this user
end

Every time a user "likes" or "dislikes" an event, a background job is created:

RecalculateRelevantEvents.rb:

def self.perform(event_id)
  ## Step 1: Find any events that that share keywords with Event.find(event_id)
  ## Step 2: calculate_event_relevance(event) for each event from above step
end

So here's a summary of the process:

User likes or dislikes an event
Background job is created which finds similar events to event in step 1
Every similar event is recalculated based on user's keywords

I'm trying to figure out ways to optimize my approach since it can quickly get out of hand. Average user will swipe through about 20 events per minute. An event can have up to 1000 similar events. And each event has around 100 keywords.

So with my approach, per swipe, I need to loop through 1000 events, then loop through 100 keywords in each event. And this happens 20 times a minute per user.

How should I approach this?

Answer 1

do you have to calculate per swipe? could you debounce it, and recalculate for the user no more than once every 5 minutes?

This data doesn't need to be updated 20 times a second to be useful, in fact, being updated every second is probably much more often than is useful.

With a 5 min debounce, you go from 6,000 (20 * 60 * 5) recalculations per user to 1 in the same period - pretty big savings.

I would also recommend using sidekiq if you can, with its multithreaded processing you'll get a huge boost to the number of simultaneous jobs - I'm a big fan.

And them once you are using it, you could try a gem like: https://github.com/hummingbird-me/sidekiq-debounce

...that provides the kind of debounce I was suggesting.

Rails, Heroku, and Resque: Long Running Background Job Optimization

Question

1 answers

solution1
1 ACCPTED 2016-10-18 18:27:49

Rails, Heroku, and Resque: Long Running Background Job Optimization

Question

1 answers

solution1 1 ACCPTED 2016-10-18 18:27:49

solution1
1 ACCPTED 2016-10-18 18:27:49