简体   繁体   English

Rails 3.1-Heroku上的MySQL和PostgreSQL之间的巨大查询时间差

[英]Rails 3.1 - huge query time difference between mySQL and PostgreSQL on Heroku

I have a query in my dev environment that typically takes about 1.7 ms to run on the dev mySQL database. 我在开发环境中有一个查询,通常需要大约1.7毫秒才能在开发mySQL数据库上运行。 When bumped up to Heroku and PostgreSQL, the same query on the same data set is taking about 1.2 seconds ! 当碰到Heroku和PostgreSQL时,对相同数据集的相同查询大约需要1.2秒

SELECT distinct user_id, score, quality 
FROM `reports` 
WHERE (datetime_utc >= '2012-01-13 14:00:00' AND 
       datetime_utc <= '2012-01-14 14:00:00') 
ORDER BY score DESC, quality DESC LIMIT 20

I created a compound index on score and quality which helped with the SQL version, but the query running on PostgreSQL is still very, very slow. 我创建了分数和质量的复合索引,这对SQL版本有所帮助,但是在PostgreSQL上运行的查询仍然非常非常慢。 My first instinct is to check that the index is actually in place on the Heroku side, but i'm not sure quite how to do that - in any case, i have a feeling it's more to do with the fact that mySQL and PostgreSQL don't do things quite the same way. 我的第一个直觉是检查索引在Heroku端是否确实存在,但是我不确定该怎么做-无论如何,我觉得与mySQL和PostgreSQL无关的事实更多做事情的方式不一样。

Any insights or pointers would be hugely appreciated! 任何见解或指针将不胜感激!

Try this modified query: 尝试以下修改的查询:

SELECT user_id, score, quality
FROM   reports
WHERE  datetime_utc BETWEEN '2012-01-13 14:00:00' AND '2012-01-14 14:00:00'
GROUP  BY user_id, score, quality
ORDER  BY score DESC, quality DESC
LIMIT  20
  • Since DISTINCT is applied last, it may be slower than GROUP BY with many non-distinct rows. 由于DISTINCT是最后应用的,因此对于许多不明显的行,它可能GROUP BY慢。 You'd have to test - with EXPLAIN ANALYZE . 您必须进行测试-使用EXPLAIN ANALYZE Otherwise, the result is the same. 否则,结果是相同的。

  • Minor simplification to the WHERE clause with BETWEEN . 使用BETWEEN对WHERE子句进行次要简化。 Removed non-standard MySQL syntax. 删除了非标准的MySQL语法。

  • An Index on (score, quality) will hardly get used. 关于(score, quality)的索引将很难使用。 The useful index here is (should make a big difference in most scenarios): 有用的索引是(在大多数情况下应该有很大的不同):

CREATE INDEX reports_date_time_utc_idx ON reports (date_time_utc)

The important part is the index. 重要的部分是索引。

Could the difference in performance be caused by size of datasets between dev and heroku? 性能差异是否可能由dev和heroku之间的数据集大小引起?

Having an index on score,quality will not help much if there are many rows, since it still must filter based on datetime_utc. 拥有得分索引,如果有很多行,质量将无济于事,因为它仍然必须基于datetime_utc进行过滤。

You may wish to consider an index on datetime_utc, since it needs to filter there first. 您可能希望考虑datetime_utc上的索引,因为它需要首先在此处过滤。

If you really want to optimize for read speed, you could have a compound index on datetime_utc, score, quality, user_id which would completely eliminate the need to lookup the row data. 如果您确实想优化读取速度,则可以在datetime_utc,score,quality和user_id上使用复合索引,这将完全消除查找行数据的需要。

However, beware of doing that, since you may then cause a hotspot on inserts with such a wide index. 但是,请注意这样做,因为这样可能会导致索引如此宽的插入片段出现热点。

As Heroku doesn't allow you to connect to the database unless having one of the >200$/month plan you could try to retrieve a local copy of the database for local inspection. 因为Heroku不允许您连接到数据库,除非具有每月200美元以上的计划之一,您可以尝试检索数据库的本地副本以进行本地检查。

heroku db:pull // Will give you a local copy of the db

The result will be something like this: 结果将是这样的:

Receiving schema
Receiving data
8 tables, 591 records
users:         100% |================================| Time: 00:00:00
pages:         100% |================================| Time: 00:00:00
comments:      100% |================================| Time: 00:00:00
tags:          100% |================================| Time: 00:00:00
Receiving indexes
Resetting sequences

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM