简体   繁体   English

如何使用多个联合优化冗长的MySQL查询

[英]How to optimise lengthly MySQL query with lots of unions

I have this query which I need help with. 我有此查询,需要帮助。 So there is a table called insertjobticket with column called DEL which is a long character field which can have multiple dates in it. 因此,存在一个名为insertjobticket的表,其中的表名为DEL ,这是一个长字符字段,其中可以包含多个日期。 I need to create an output table which contains one row for each time there is a date in the DEL field for a certain range of dates. 我需要创建一个输出表,每次在DEL字段中存在某个日期范围的日期时,该表就包含一行。

The reason I can't just do a more simple select ... where ... DEL like "%my_date%" is that the DEL column can contain multiple dates, and if so, I need to return multiple rows to the output set, one row for each date that appears in the DEL column. 我不能做一个更简单的select ... where ... DEL like "%my_date%"的原因select ... where ... DEL like "%my_date%"DEL列可以包含多个日期,如果这样,我需要将多个行返回到输出集, DEL列中出现的每个日期一行。

The solution I came up with that works, but is very slow looks like this: 我想出的解决方案可行,但速度很慢,如下所示:

create temporary table jobtrack.ship_helpert3 as 
select * from 
(
    (
    select  
        date_format(now() - interval 3 day, '%m/%d/%Y') as `Ship_Date`,
        more_columns 
    from   
        jobticket.insertjobticket 
    where         
        DEL like concat('%',date_format(now() - interval 3 day, '%m/%d/%Y'),'%')
    ) union (
    select  
        date_format(now() + interval 2 day, '%m/%d/%Y') as `Ship_Date`,
        more_columns 
    from   
        jobticket.insertjobticket 
    where         
        DEL like concat('%',date_format(now() + interval 2 day, '%m/%d/%Y'),'%')
    ) union (
    select  
        date_format(now() + interval 1 day, '%m/%d/%Y') as `Ship_Date`, 
        more_columns 
    from   
        jobticket.insertjobticket 
    where         
        DEL like concat('%',date_format(now() + interval 1 day, '%m/%d/%Y'),'%')
    ) union ...
) t;

Each select query checks if there are any rows with a certain date string ( date_format(now() + interval @x day, '%m/%d/%Y') ) in the DEL field. 每个select查询都会检查DEL字段中是否有带有特定日期字符串的行( date_format(now() + interval @x day, '%m/%d/%Y') )。 The query is built programmatically and can get very long, as I would like to be able to make the query check for many many dates. 该查询是通过编程方式构建的,并且可能会很长,因为我希望能够对许多日期进行查询检查。

The insertjobticket table contains 40K rows and is growing, so the query above takes way too long to complete. insertjobticket表包含4万行,并且正在增长,因此上述查询花费的时间太长,无法完成。 I understand why it takes so long, because every union effectively has to make its own sub-query that scans the whole table again and again for each date. 我知道为什么要花这么长时间,因为每个union实际上都必须做出自己的子查询,该子查询一次又一次地扫描每个日期的整个表。 I just don't know how to make this work more efficiently. 我只是不知道如何使这项工作更有效率。

Does anyone know how to speed up this query? 有人知道如何加快查询速度吗?

Thanks for the help and let me know if we need more clarification. 感谢您的帮助,如果需要进一步说明,请告诉我。

As already stretched in the comments, the only correct solution would be to normalize your data, that means to create a new table with one delivery date and the primary key of insertjobticket per row, and let the application use this table directly instead of the column del , or at least indirectly by a trigger that updates this table everytime the column DEL is updated. 正如评论中已经提到的,唯一正确的解决方案是对数据进行规范化,这意味着创建一个具有一个交付日期和每行insertjobticket主键的新表,并让应用程序直接使用此表代替列del ,或者至少由每次列DEL更新时更新此表的触发器间接地。

Since you cannot do that, the following workaround should improve your query: 由于您无法执行此操作,因此以下解决方法应会改善您的查询:

select 
  del_dates.Ship_Date, 
  othercolumns 
from insertjobticket
join (
    select concat(date_format(now() + interval 2 day, '%m/%d/%Y')) 
           collate utf8_general_ci as Ship_Date 
    union select concat(date_format(now() + interval 1 day, '%m/%d/%Y')) 
    union select concat(date_format(now() + interval -15 day, '%m/%d/%Y')) 
    ...  
) del_dates
on insertjobticket.del like concat('%', del_dates.Ship_Date, '%');

(Change the collation to the one you use in your table or leave it away to see which one, if any, you need). (将排序规则更改为您在表中使用的排序规则,或者将其保留以查看需要的排序规则)。

This will basically do the required normalization step (for the requested dates) every time you execute the query, and will not be able to use indexes. 基本上,每次执行查询时,这将基本执行所需的规范化步骤(针对请求的日期),并且将无法使用索引。 Just make sure your explain output shows using join buffer for the derived table, not for insertjobticket , otherwise replace join with a straight_join . 只要确保你的explain输出显示using join buffer派生表,而不是insertjobticket ,否则更换joinstraight_join

For 40k rows, this might not be a big a problem, and there is no other way around it anyway, except real normalization. 对于40k行,这可能不是什么大问题,并且除实际归一化之外,没有其他解决方法。 Keep in mind that your query will slow down linearly with the amount of rows (400k rows will take about 10 times the time as 40k), an effect indexes would prevent. 请记住,查询会随着行数的增加而线性降低(400k行将是40k行的时间的10倍左右),效果索引会阻止这种情况。 So if it is too slow now (or sometimes in the future), you eventually have to normalize (or, as a workaround to the problems created by this workaround, add a column to mark old entries and exclude them in your join condition). 因此,如果现在(或将来有时)太慢,则最终必须进行规范化(或作为对此变通方法所产生问题的变通方法,添加一列以标记旧条目并将其排除在联接条件之外)。

Btw, since you generate your code programmatically, it shouldn't be a problem to create the list of dates, otherwise you can use another subquery to generate a list of general dates and just select the ones in a specific range. 顺便说一句,由于您以编程方式生成代码,因此创建日期列表应该不是问题,否则您可以使用另一个子查询来生成常规日期列表,只需选择特定范围内的日期即可。

In using a database, you must consider the eventual uses of the stored data. 在使用数据库时,必须考虑存储数据的最终用途。

In this case, you needed to parse DEL as you were about to store it, and build another table of pairs of dates (buried in DEL) and ids (of insertjobticket). 在这种情况下,您需要在存储DEL进行解析,并构建另一个由日期对(埋藏在DEL中)和ID(insertjobticket的对)组成的表。

Trying to do the parsing after the fact is much slower and leads to scaling problems. 事实发生后尝试进行解析要慢得多,并会导致扩展问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM