[英]How to optimise lengthly MySQL query with lots of unions
I have this query which I need help with. 我有此查询,需要帮助。 So there is a table called
insertjobticket
with column called DEL
which is a long character field which can have multiple dates in it. 因此,存在一个名为
insertjobticket
的表,其中的表名为DEL
,这是一个长字符字段,其中可以包含多个日期。 I need to create an output table which contains one row for each time there is a date in the DEL
field for a certain range of dates. 我需要创建一个输出表,每次在
DEL
字段中存在某个日期范围的日期时,该表就包含一行。
The reason I can't just do a more simple select ... where ... DEL like "%my_date%"
is that the DEL
column can contain multiple dates, and if so, I need to return multiple rows to the output set, one row for each date that appears in the DEL
column. 我不能做一个更简单的
select ... where ... DEL like "%my_date%"
的原因select ... where ... DEL like "%my_date%"
是DEL
列可以包含多个日期,如果这样,我需要将多个行返回到输出集, DEL
列中出现的每个日期一行。
The solution I came up with that works, but is very slow looks like this: 我想出的解决方案可行,但速度很慢,如下所示:
create temporary table jobtrack.ship_helpert3 as
select * from
(
(
select
date_format(now() - interval 3 day, '%m/%d/%Y') as `Ship_Date`,
more_columns
from
jobticket.insertjobticket
where
DEL like concat('%',date_format(now() - interval 3 day, '%m/%d/%Y'),'%')
) union (
select
date_format(now() + interval 2 day, '%m/%d/%Y') as `Ship_Date`,
more_columns
from
jobticket.insertjobticket
where
DEL like concat('%',date_format(now() + interval 2 day, '%m/%d/%Y'),'%')
) union (
select
date_format(now() + interval 1 day, '%m/%d/%Y') as `Ship_Date`,
more_columns
from
jobticket.insertjobticket
where
DEL like concat('%',date_format(now() + interval 1 day, '%m/%d/%Y'),'%')
) union ...
) t;
Each select
query checks if there are any rows with a certain date string ( date_format(now() + interval @x day, '%m/%d/%Y')
) in the DEL
field. 每个
select
查询都会检查DEL
字段中是否有带有特定日期字符串的行( date_format(now() + interval @x day, '%m/%d/%Y')
)。 The query is built programmatically and can get very long, as I would like to be able to make the query check for many many dates. 该查询是通过编程方式构建的,并且可能会很长,因为我希望能够对许多日期进行查询检查。
The insertjobticket
table contains 40K rows and is growing, so the query above takes way too long to complete. insertjobticket
表包含4万行,并且正在增长,因此上述查询花费的时间太长,无法完成。 I understand why it takes so long, because every union
effectively has to make its own sub-query that scans the whole table again and again for each date. 我知道为什么要花这么长时间,因为每个
union
实际上都必须做出自己的子查询,该子查询一次又一次地扫描每个日期的整个表。 I just don't know how to make this work more efficiently. 我只是不知道如何使这项工作更有效率。
Does anyone know how to speed up this query? 有人知道如何加快查询速度吗?
Thanks for the help and let me know if we need more clarification. 感谢您的帮助,如果需要进一步说明,请告诉我。
As already stretched in the comments, the only correct solution would be to normalize your data, that means to create a new table with one delivery date and the primary key of insertjobticket
per row, and let the application use this table directly instead of the column del
, or at least indirectly by a trigger that updates this table everytime the column DEL
is updated. 正如评论中已经提到的,唯一正确的解决方案是对数据进行规范化,这意味着创建一个具有一个交付日期和每行
insertjobticket
主键的新表,并让应用程序直接使用此表代替列del
,或者至少由每次列DEL
更新时更新此表的触发器间接地。
Since you cannot do that, the following workaround should improve your query: 由于您无法执行此操作,因此以下解决方法应会改善您的查询:
select
del_dates.Ship_Date,
othercolumns
from insertjobticket
join (
select concat(date_format(now() + interval 2 day, '%m/%d/%Y'))
collate utf8_general_ci as Ship_Date
union select concat(date_format(now() + interval 1 day, '%m/%d/%Y'))
union select concat(date_format(now() + interval -15 day, '%m/%d/%Y'))
...
) del_dates
on insertjobticket.del like concat('%', del_dates.Ship_Date, '%');
(Change the collation to the one you use in your table or leave it away to see which one, if any, you need). (将排序规则更改为您在表中使用的排序规则,或者将其保留以查看需要的排序规则)。
This will basically do the required normalization step (for the requested dates) every time you execute the query, and will not be able to use indexes. 基本上,每次执行查询时,这将基本执行所需的规范化步骤(针对请求的日期),并且将无法使用索引。 Just make sure your
explain
output shows using join buffer
for the derived table, not for insertjobticket
, otherwise replace join
with a straight_join
. 只要确保你的
explain
输出显示using join buffer
派生表,而不是insertjobticket
,否则更换join
用straight_join
。
For 40k rows, this might not be a big a problem, and there is no other way around it anyway, except real normalization. 对于40k行,这可能不是什么大问题,并且除实际归一化之外,没有其他解决方法。 Keep in mind that your query will slow down linearly with the amount of rows (400k rows will take about 10 times the time as 40k), an effect indexes would prevent.
请记住,查询会随着行数的增加而线性降低(400k行将是40k行的时间的10倍左右),效果索引会阻止这种情况。 So if it is too slow now (or sometimes in the future), you eventually have to normalize (or, as a workaround to the problems created by this workaround, add a column to mark old entries and exclude them in your join condition).
因此,如果现在(或将来有时)太慢,则最终必须进行规范化(或作为对此变通方法所产生问题的变通方法,添加一列以标记旧条目并将其排除在联接条件之外)。
Btw, since you generate your code programmatically, it shouldn't be a problem to create the list of dates, otherwise you can use another subquery to generate a list of general dates and just select the ones in a specific range. 顺便说一句,由于您以编程方式生成代码,因此创建日期列表应该不是问题,否则您可以使用另一个子查询来生成常规日期列表,只需选择特定范围内的日期即可。
In using a database, you must consider the eventual uses of the stored data. 在使用数据库时,必须考虑存储数据的最终用途。
In this case, you needed to parse DEL
as you were about to store it, and build another table of pairs of dates (buried in DEL) and ids (of insertjobticket). 在这种情况下,您需要在存储
DEL
进行解析,并构建另一个由日期对(埋藏在DEL中)和ID(insertjobticket的对)组成的表。
Trying to do the parsing after the fact is much slower and leads to scaling problems. 事实发生后尝试进行解析要慢得多,并会导致扩展问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.