简体   繁体   English

使用两列之间的差异来加速数据库查询:created_at和updated_at

[英]Speed up database query using difference between 2 columns: created_at and updated_at

In my Rails project I have a Message model and I have hundreds of thousands of messages in my database. 在我的Rails项目中,我有一个Message模型,并且数据库中有成千上万条消息。 It also has a column "Status" that can be 'queued' or 'delivered'. 它还有一个“状态”列,可以“排队”或“交付”。

When a message is created, its status becomes "queued" and obviously the created_at field is populated. 创建消息时,其状态变为“已排队”,并且显然填充了created_at字段。 After some time(I won't go into details how), the status of that message will become "delivered". 一段时间后(我将不详细介绍如何),该消息的状态将变为“已传递”。

Now, for hundreds of thousands of messages, I want to group them by their delivery times. 现在,对于成千上万的邮件,我想按其发送时间对其进行分组。 In other words, calculate the difference between updated_at and created_at and group them into 0-3 minutes, 3-5 minutes, 5-10 minutes, and over 10 minutes. 换句话说,计算updated_atcreated_at之间的差异,并将它们分为0-3分钟,3-5分钟,5-10分钟以及10分钟以上。

The way I currently do it is 我目前的做法是

delivery_time_data = []
    time_intervals = [{lb: 0.0, ub: 180.0}, {lb: 180.0, ub: 300.0}, {lb: 300.0, ub: 600.0},{lb: 600.0, ub: 31*3600*24}]
    time_intervals.each_with_index do |ti, i|
      @messages = Message.where(account_id: @account.id)
                      .where(created_at: @start_date..@end_date)
                      .where(direction: 'outgoing')
                      .where(status: Message::STATUS_DELIVERED)
                      .where('status_updated_at - created_at >= ?', "#{ti[:lb]} seconds")
                      .where('status_updated_at - created_at < ?', "#{ti[:ub]} seconds")
      if i == time_intervals.count - 1
        delivery_time_data.push([i+1, "Greater than #{ti[:lb]/60.to_i} minutes", @messages.count])
      else
        delivery_time_data.push([i+1, "#{ti[:lb]/60.to_i} minutes to #{ti[:ub]/60.to_i} minutes", @messages.count])
      end

It works. 有用。 But it's very slow, and when I have ~200000 messages the server potentially can crash. 但这非常慢,当我收到约200000条消息时,服务器可能会崩溃。

If I expect messages to be created fairly frequently, is it even a good idea to add index on created_at ? 如果我希望消息被频繁创建,那么在created_at上添加索引是否是一个好主意?

Thanks. 谢谢。

It may be that you need the right index. 可能是您需要正确的索引。

The fields you need to index are: 您需要索引的字段是:

  • direction 方向
  • status 状态
  • account_id 帐户ID
  • created_at created_at

So add the following index in a migration: 因此,在迁移中添加以下索引:

add_index :messages, [:direction, :status, :account_id, :created_at]

Some databases, including postgresql, can index on expressions. 一些数据库,包括postgresql,可以在表达式上建立索引。 For best results add ( updated_at - created_at ) as your fifth value to index. 为了获得最佳结果,请添加( updated_at - created_at )作为要索引的第五个值。 You will have to create this with SQL instead of the rails migration. 您将必须使用SQL而不是rails迁移来创建它。

I wouldn't worry about the added time to create records on an indexed table. 我不用担心在索引表上创建记录所花费的时间。 I just wouldn't worry about it. 我只是不用担心。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM