简体   繁体   English

如何使此查询更高效?

[英]how can I make this query more efficient?

edit: here is a simplified version of the original query (runs in 3.6 secs on a products table of 475K rows) 编辑:这是原始查询的简化版本(在475K行的产品表上以3.6秒的时间运行)

SELECT p.*, shop FROM products p JOIN
users u ON p.date >= u.prior_login and u.user_id = 22 JOIN
shops s ON p.shop_id = s.shop_id
ORDER BY shop, date, product_id;

this is the explain plan 这是解释计划

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  u   const   PRIMARY,prior_login,user_id PRIMARY 4   const   1   Using temporary; Using filesort
1   SIMPLE  s   ALL PRIMARY NULL    NULL    NULL    90   
1   SIMPLE  p   ref shop_id,date,shop_id_2,shop_id_3    shop_id 4   bitt3n_minxa.s.shop_id  5338    Using where

the bottleneck seems to be ORDER BY date,product_id . 瓶颈似乎是ORDER BY date,product_id Removing these two orderings, the query runs in 0.06 seconds. 删除这两个顺序,查询将在0.06秒内运行。 (Removing either one of the two (but not both) has virtually no effect, query still takes over 3 seconds.) I have indexes on both product_id and date in the products table. (删除这两者之一(但不能全部删除)几乎没有任何效果,查询仍然需要3秒钟以上。)我在product表中同时具有product_id和date的索引。 I have also added an index on (product,date) with no improvement. 我还添加了关于(产品,日期)的索引,但没有改善。

newtover suggests the problem is the fact that the INNER JOIN users u1 ON products.date >= u1.prior_login requirement is preventing use of the index on products.date newtover提示问题在于, INNER JOIN users u1 ON products.date >= u1.prior_login要求阻止了在products.date上使用索引

Two variations of the query that execute in ~0.006 secs (as opposed to 3.6 secs for the original) have been suggested to me (not from this thread). 已经向我建议了两种查询变体,它们在〜0.006秒(而不是原始的3.6秒)内执行(不是从该线程执行的)。

this one uses a subquery, which appears to force the order of the joins 这使用了一个子查询,该子查询似乎强制了连接的顺序

SELECT p.*, shop 
  FROM 
  (
    SELECT p.*
    FROM products p 
    WHERE p.date >= (select prior_login FROM users where user_id = 22)
  ) as p
  JOIN shops s 
    ON p.shop_id = s.shop_id
  ORDER BY shop, date, product_id;

this one uses the WHERE clause to do the same thing (although the presence of SQL_SMALL_RESULT doesn't change the execution time, 0.006 secs without it as well) 此代码使用WHERE子句执行相同的操作(尽管SQL_SMALL_RESULT的存在并不会更改执行时间,没有SQL_SMALL_RESULT的执行时间也会更改为0.006秒)

SELECT SQL_SMALL_RESULT p . * , shop
FROM products p
INNER JOIN shops s ON p.shop_id = s.shop_id
WHERE p.date >= ( 
SELECT prior_login
FROM users
WHERE user_id =22 ) 
ORDER BY shop, DATE, product_id;

My understanding is that these queries work much faster on account of reducing the relevant number of rows of the product table before joining it to the shops table. 我的理解是,由于在将产品表连接到shops表之前减少了产品表的相关行数,因此这些查询的工作速度更快。 I am wondering if this is correct. 我想知道这是否正确。

Use the EXPLAIN statement to see the execution plan. 使用EXPLAIN语句查看执行计划。 Also you can try adding an index to products.date and u1.prior_login . 您也可以尝试向products.dateu1.prior_login添加索引。

Also please just make sure you have defined your foreign keys and they are indexed. 另外,请确保已定义外键并已对其进行索引。

Good luck. 祝好运。

We do need an explain plan... but 我们确实需要一个解释计划...但是

Be very careful of select * from table where id in (select id from another_table) This is a notorious. 要非常小心,从表中的id中选择* *(从another_table中选择id)这是一个臭名昭著的事情。 Generally these can be replaced by a join. 通常,这些可以替换为联接。 The following query might run, although I haven't tested it. 以下查询可能会运行,尽管我尚未对其进行测试。

SELECT shop,
       shops.shop_id AS shop_id,
       products.product_id AS product_id,
       brand,
       title,
       price,
       image AS image,
       image_width,
       image_height,
       0 AS sex,
       products.date AS date,
       fav1.favorited AS circle_favorited,
       fav2.favorited AS session_user_favorited,
       u2.username AS circle_username
  FROM products
       LEFT JOIN favorites fav2
          ON     fav2.product_id = products.product_id
             AND fav2.user_id = 22
             AND fav2.current = 1
       INNER JOIN shops
          ON shops.shop_id = products.shop_id
       INNER JOIN users u1
          ON products.date >= u1.prior_login AND u1.user_id = 22
       LEFT JOIN favorites fav1
          ON products.product_id = fav1.product_id
       LEFT JOIN friends f1
          ON f1.star_id = fav1.user_id
       LEFT JOIN users u2
          ON fav1.user_id = u2.user_id
 WHERE f1.fan_id = 22 OR fav1.user_id = 22
ORDER BY shop,
         DATE,
         product_id,
         circle_favorited

the fact that the query is slow because of the ordering is rather obvious since it is hard to find an index that would to apply ORDER BY in this case. 由于排序很慢,查询很慢这一事实非常明显,因为在这种情况下很难找到要应用ORDER BY的索引。 The main problem is products.date >= comparison which breaks using any index for ORDER BY. 主要问题是products.date >=比较,它使用ORDER BY的任何索引时都会中断。 And since you have a lot of data to output, MySQL starts using temporary tables for sorting. 而且由于要输出大量数据,MySQL开始使用临时表进行排序。

what i would to is to try to force MySQL output data in the order of an index which already has the required order and remove the ORDER BY clause. 我要做的是尝试按已经具有所需顺序的索引顺序强制MySQL输出数据,并删除ORDER BY子句。

I am not at a computer to test, but how would I do it: 我不是要测试的计算机,但是我将如何做:

  • I would do all inner joins 我会做所有内部联接
  • then I would LEFT JOIN to a subquery which makes all computations on favorites ordered by product_id, circle_favourited (which would provide the last ordering criterion). 然后我将LEFT JOIN到一个子查询,该子查询对由product_id,circle_favourited(将提供最后的排序条件)排序的收藏夹进行所有计算。

So, the question is how to make the data be sorted on shop, date, product_id 因此,问题是如何使数据在商店,日期,product_id上排序

I am going to write about it a bit later =) 我稍后再写=)

UPD1: UPD1:

You should probably read something on how btree indexes work in MySQL. 您可能应该阅读有关btree索引如何在MySQL中工作的内容。 There is a good article on mysqlperformanceblog.com about it (I currently write from a mobile and don't have the link at hand). mysqlperformanceblog.com上有一篇很好的文章(我目前在手机上撰写,没有链接可用)。 In short, you seem to talk about one-column indexes which arrange pointers to rows based on values sorted in a single column. 简而言之,您似乎在谈论单列索引,该索引基于在单列中排序的值来排列指向行的指针。 Compound indexes store an order based on several columns. 复合索引基于多个列存储订单。 Indexes mostly used to operate on clearly defined ranges of them to obtain most of the information before retrieving data from the rows they point at. 在从索引所指向的行中检索数据之前,大多数索引通常用于对其明确定义的范围进行操作,以获取大多数信息。 Indexes usually do not know about other indexes on the same table, as result they are rarely merged. 索引通常不知道同一张表上的其他索引,因此它们很少合并。 when there is no more info to take from the index, MySQL starts to operate directly on data. 当索引中没有更多信息时,MySQL开始直接对数据进行操作。

That is an index on date can not make use of the index on product_id, but an index on (date, product_id) can get some more info on product_id after a condition on date (sort on product id for a specific date match). 也就是说,日期索引不能使用product_id上的索引,但是日期日期后(针对特定日期匹配的产品ID排序),(日期,product_id)上的索引可以获取有关product_id的更多信息。

Nevertheless, a range condition on date (>=) breaks this. 但是,日期范围条件(> =)打破了这一点。 That is what I was talking about. 那就是我在说的。

UPD2: UPD2:

As I uderstand the problem can be reduced to (most of the time it spends on that): 据我了解,这个问题可以减少到(大部分时间都花在此上):

SELECT p.*, shop
FROM products p
JOIN users u ON p.`date` >= u.prior_login and u.user_id = 22
JOIN shops s ON p.shop_id = s.shop_id
ORDER BY shop, `date`, product_id;

Now add an index (user_id, prior_login) on users and (date) on products, and try the following query: 现在在用户上添加索引(user_id,previous_login),在产品上添加索引(date),然后尝试以下查询:

SELECT STRAIGHT_JOIN p.*, shop
FROM (
  SELECT product_id, shop
  FROM users u
  JOIN products p
    user_id = 22 AND p.`date` >= prior_login
  JOIN shops s
    ON p.shop_id = s.shop_id
  ORDER BY shop, p.`date`, product_id
) as s
JOIN products p USING (product_id);

If I am correct the query should return the same result but quicker. 如果我是正确的,查询应该返回相同的结果,但速度更快。 If would be nice if you would post the result of EXPLAIN for the query. 如果可以,则将EXPLAIN的结果发布给查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM