简体   繁体   English

如何避免在联合中两次运行昂贵的子查询

[英]How to avoid running an expensive sub-query twice in a union

I want to union two queries. 我想合并两个查询。 Both queries use an inner join into a data set, that is very intensive to compute, but the dataset query is the same for both queries. 这两个查询都使用对数据集的内部联接,该联接非常难以计算,但是两个查询的数据集查询都相同。 For example: 例如:

SELECT veggie_id
FROM potatoes
INNER JOIN ( [...] ) massive_market
    ON massive_market.potato_id=potatoes.potato_id
UNION
SELECT veggie_id
FROM carrots
INNER JOIN ( [...] ) massive_market
    ON massive_market.carrot_id=carrots.carrot_id

Where [...] corresponds to a subquery that takes a second to compute, and returns rows of at least carrot_id and potato_id. 其中[...]对应于需要一秒钟计算的子查询,并返回至少包含胡萝卜和土豆的行。

I want to avoid having the query for massive_market [...] twice in my overal query. 我想避免在总体查询中两次查询Massive_market [...]

Whats the best way to do this? 最好的方法是什么?

If that subquery takes more than a second to run, I'd say it's down to an indexing issue as opposed to the query itself (of course, without seeing that query, that is somewhat conjecture, I'd recommend posting that query too). 如果该子查询需要多于一秒钟的时间运行,那么我想说这是一个索引问题,而不是查询本身(当然,在没有看到该查询的情况下,这在某种程度上是推测,我建议也发布该查询) 。 In my experience, 9/10 slow queries issues are down to improper indexing of the database. 以我的经验,9/10缓慢的查询问题归结为数据库索引不正确。

Ensure veggie_id, potato_id and carrot_id are indexed 确保已将veggie_id,potato_id和胡萝卜_id索引

Also, if you're using any joins in the massive_market subquery, ensure the columns you're performing the joins on are indexed too. 另外,如果您在Massive_market子查询中使用任何联接,请确保对正在执行联接的列也进行了索引。

Edit 编辑

If indexing has been done properly, the only other solution I can think of off the top of my head is: 如果索引已正确完成,那么我想到的唯一其他解决方案是:

CREATE TEMPORARY TABLE tmp_veggies (potato_id [datatype], carrot_id [datatype]);

INSERT IGNORE INTO tmp_veggies (potato_id, carrot_id) select potatoes.veggie_id, carrots.veggie_id from [...] massive_market 
    RIGHT OUTER JOIN potatoes on massive_market.potato_id = potatoes.potato_id 
    RIGHT OUTER JOIN carrots on massive_market.carrot_id = carrots.carrot_id;
SELECT carrot_id FROM tmp_veggies
UNION
SELECT potato_id FROM tmp_veggies;

This way, you've reversed the query so it's only running the massive subquery once and the UNION is happening on the temporary table (which'll be dropped automatically but not until the connection is closed , so you may want to drop the table manually). 这样,您已经反转了查询,因此它只运行一次大规模子查询,并且UNION正在临时表上发生(临时表将自动删除, 但直到关闭连接后才会删除),因此您可能希望手动删除表)。
You can add any additional columns you need into the CREATE TEMPORARY TABLE and SELECT statement 您可以将所需的任何其他列添加到CREATE TEMPORARY TABLESELECT语句中

The goal is to pull all repeated query-strings out of the list of query-strings requiring the repeated query-strings. 目的是将所有重复的查询字符串从需要重复查询字符串的查询字符串列表中拉出。 So I kept potatoes and carrots within one unionizing subquery, and placed massive_market afterwards and outside this unification. 因此,我将土豆和胡萝卜放在一个联合子查询中,然后将Massive_market放在此统一之外。

This seems obrvious, but my question originated from a much more complex query, and the work needed to pull this strategy off was a bit more involving in my case. 这似乎很明显,但是我的问题来自一个更为复杂的查询,而实施此策略所需的工作在我的案例中涉及更多。 For my simple example in my question above, this would resolve in something like: 对于上述问题中的简单示例,这可以通过以下方式解决:

SELECT veggie_id 
FROM (
  SELECT veggie_id, potato_id, NULL AS carrot_id FROM potatoes
  UNION
  SELECT veggie_id, NULL AS potato_id, carrot_id FROM carrots
) unionized
INNER JOIN ( [...] ) massive_market
  ON massive_market.potato_id=unionized.potato_id 
    OR massive_market.carrot_id=unionized.carrot_id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM