需要替代Ruby on Rails项目的过滤器/观察器的方法

Question

Rails has a nice set of filters (before_validation, before_create, after_save, etc) as well as support for observers, but I'm faced with a situation in which relying on a filter or observer is far too computationally expensive. Rails拥有一组不错的过滤器（before_validation，before_create，after_save等）以及对观察者的支持，但是我面临着这样一种情况，即依赖过滤器或观察者在计算上过于昂贵。 I need an alternative. 我需要替代方法。

The problem: I'm logging web server hits to a large number of pages. 问题：我正在记录Web服务器命中的大量页面。 What I need is a trigger that will perform an action (say, send an email) when a given page has been viewed more than X times. 我需要的是一个触发器，当给定的页面被查看超过X次时，它将执行一个操作（例如，发送电子邮件）。 Due to the huge number of pages and hits, using a filter or observer will result in a lot of wasted time because, 99% of the time, the condition it tests will be false. 由于大量的页面和点击，使用过滤器或观察器将导致大量时间浪费，因为在99％的时间中，它测试的条件都是错误的。 The email does not have to be sent out right away (ie a 5-10 minute delay is acceptable). 电子邮件不必马上送出（即5-10分钟的延迟是可以接受的）。

What I am instead considering is implementing some kind of process that sweeps the database every 5 minutes or so and checks to see which pages have been hit more than X times, recording that state in a new DB table, then sending out a corresponding email. 相反，我正在考虑的是实现某种过程，该过程每5分钟左右扫描一次数据库，并检查哪些页面被命中X次以上，将该状态记录在新的DB表中，然后发送相应的电子邮件。 It's not exactly elegant, but it will work. 它并不完全优雅，但可以使用。

Does anyone else have a better idea? 还有其他人有更好的主意吗？

Answer 1

Rake tasks are nice! 耙子任务很好！ But you will end up writing more custom code for each background job you add. 但是最终您将为添加的每个后台作业编写更多的自定义代码。 Check out the Delayed Job plugin http://blog.leetsoft.com/2008/2/17/delayed-job-dj 查看延迟作业插件http://blog.leetsoft.com/2008/2/17/delayed-job-dj

DJ is an asynchronous priority queue that relies on one simple database table. DJ是一个异步优先级队列，它依赖于一个简单的数据库表。 According to the DJ website you can create a job using Delayed::Job.enqueue() method shown below. 根据DJ网站，您可以使用如下所示的Delayed :: Job.enqueue（）方法创建作业。

class NewsletterJob < Struct.new(:text, :emails)
  def perform
    emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
  end    
end  

Delayed::Job.enqueue( NewsletterJob.new("blah blah", Customers.find(:all).collect(&:email)) )

Answer 2

I was once part of a team that wrote a custom ad server, which has the same requirements: monitor the number of hits per document, and do something once they reach a certain threshold. 我曾经是一个编写自定义广告服务器的团队的成员，该团队具有相同的要求：监视每个文档的点击数，并在达到一定阈值时执行一些操作。 This server was going to be powering an existing very large site with a lot of traffic, and scalability was a real concern. 该服务器将为现有的非常大的站点提供大量流量，而可伸缩性是一个真正的问题。 My company hired two Doubleclick consultants to pick their brains. 我公司聘请了两名Doubleclick顾问来绞尽脑汁。

Their opinion was: The fastest way to persist any information is to write it in a custom Apache log directive. 他们的意见是：保留任何信息的最快方法是在自定义Apache log指令中编写信息。 So we built a site where every time someone would hit a document (ad, page, all the same), the server that handled the request would write a SQL statement to the log: "INSERT INTO impressions (timestamp, page, ip, etc) VALUES (x, 'path/to/doc', y, etc);" 因此，我们建立了一个站点，每次有人点击文档（广告，页面，都一样）时，处理请求的服务器就会在日志中写入一条SQL语句：“ INSERT INTO印象（时间戳，页面，ip等））值（x，“ path / to / doc”，y等）；“ -- all output dynamically with data from the webserver. -所有动态输出与来自Web服务器的数据。 Every 5 minutes, we would gather these files from the web servers, and then dump them all in the master database one at a time. 每隔5分钟，我们将从Web服务器中收集这些文件，然后一次将其全部转储到master数据库中。 Then, at our leisure, we could parse that data to do anything we well pleased with it. 然后，有空时，我们可以解析该数据以执行我们满意的任何事情。

Depending on your exact requirements and deployment setup, you could do something similar. 根据您的确切要求和部署设置，您可以执行类似的操作。 The computational requirement to check if you're past a certain threshold is still probably even smaller (guessing here) than executing the SQL to increment a value or insert a row. 检查您是否超过某个阈值的计算要求可能甚至比执行SQL增加值或插入行还要小（在此处猜测）。 You could get rid of both bits of overhead by logging hits (special format or not), and then periodically gather them, parse them, input them to the database, and do whatever you want with them. 您可以通过记录命中（无论是否使用特殊格式）来消除开销的这两个方面，然后定期收集它们，对其进行解析，将它们输入到数据库中以及对它们进行任何所需的操作。

Answer 3

When saving your Hit model, update a redundant column in your Page model that stores a running total of hits, this costs you 2 extra queries, so maybe each hit takes twice as long to process, but you can decide if you need to send the email with a simple if. 保存命中模型时，请更新页面模型中存储了连续命中总数的冗余列，这将使您多花2个查询，因此每个命中的处理时间可能是原来的两倍，但是您可以决定是否需要发送使用简单的if电子邮件。

Your original solution isn't bad either. 您原来的解决方案也不错。

Answer 4

I have to write something here so that stackoverflow code-highlights the first line. 我必须在这里写一些东西，以便stackoverflow代码突出显示第一行。

class ApplicationController < ActionController::Base
  before_filter :increment_fancy_counter

  private

  def increment_fancy_counter
    # somehow increment the counter here
  end
end

# lib/tasks/fancy_counter.rake
namespace :fancy_counter do
  task :process do
    # somehow process the counter here
  end
end

Have a cron job run rake fancy_counter:process however often you want it to run. 让cron作业运行rake fancy_counter:process但是您经常希望它运行。

需要替代Ruby on Rails项目的过滤器/观察器的方法

问题描述

4 个解决方案

解决方案1
1 2009-05-22 22:52:09

解决方案2
1 2009-05-23 15:53:14

解决方案3
0 2009-05-22 22:07:05

解决方案4
0 2009-05-22 22:09:07

需要替代Ruby on Rails项目的过滤器/观察器的方法

问题描述

4 个解决方案

解决方案1 1 2009-05-22 22:52:09

解决方案2 1 2009-05-23 15:53:14

解决方案3 0 2009-05-22 22:07:05

解决方案4 0 2009-05-22 22:09:07

解决方案1
1 2009-05-22 22:52:09

解决方案2
1 2009-05-23 15:53:14

解决方案3
0 2009-05-22 22:07:05

解决方案4
0 2009-05-22 22:09:07