填补MySQL查询中时间序列空白的最佳方法

Question

I need to fill in the gaps of a time series of a mysql query result set. 我需要填补mysql查询结果集的时间序列的空白。 I'm in the process of testing the option of doing an outer join with a helper table that contains all of the data points of the time series (as indicated in this thread: How to fill date gaps in MySQL? ). 我正在测试使用包含时间序列的所有数据点的帮助器表进行外部联接的选项的过程（如该线程所示：如何填补MySQL中的日期空白？）。

The issue I'm running into is that adding this join significantly increases the query response time (it goes from sub 1 sec to 90 seconds). 我遇到的问题是，添加此联接会大大增加查询响应时间（从1秒到90秒）。

Here's the original query: 这是原始查询：

select date_format(fact_data7.date_collected,'%Y-%m') as date_col
   , date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
   , fact_data7.batch_id,fact_data7.value as fdvalue,entities.ticker as ticker
   , date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
   , date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7  
JOIN entities on fact_data7.entity_id=entities.id  
where (1=1)
  AND ((entities.id= 963
      AND fact_data7.metric_id=1
      ))
  AND date_format(fact_data7.date_collected,'%Y-%m') > '2008-01-01'
order by date_col asc

and here is the query with the outer join to the helper table (month_fill) added: 这是添加了外部连接到帮助器表（month_fill）的查询：

select date_format(month_fill.date,'%Y-%m') as date_col
    , date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
    , fact_data7.batch_id,fact_data7.value as fdvalue
    , entities.ticker as ticker
    , date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
    , date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7
JOIN entities
  on fact_data7.entity_id=entities.id  
RIGHT OUTER JOIN month_fill
   on date_format(fact_data7.date_collected,'%Y-%m') =  date_format(month_fill.date,'%Y-%m')  
where (1=1)
  AND (
      (entities.id= 963 AND fact_data7.metric_id=1)
      OR (entities.id is null and fact_data7.metric_id is null)
      )
  AND date_format(month_fill.date,'%Y-%m') > '2008-01-01'
order by date_col asc

Can I restructure the query to improve the performance is there an alternate solution to achieve what I'm looking for? 是否可以重组查询以提高性能，是否有替代解决方案可以实现所需的功能？

Update 11/15: 更新11/15：

Here's the EXPLAIN output for the 1st query: 这是第一个查询的EXPLAIN输出：

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  entities    const   PRIMARY     PRIMARY     4   const   1   Using filesort
1   SIMPLE  fact_data7  ALL     NULL    NULL    NULL    NULL    230636  Using where

Here's the EXPLAIN output for the 2nd query: 这是第二个查询的EXPLAIN输出：

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  month_fill  index   NULL    date    8   NULL    204     Using where; Using index; Using temporary; Using filesort
1   SIMPLE  fact_data7  ALL     NULL    NULL    NULL    NULL    230636  Using where
1   SIMPLE  entities    eq_ref  PRIMARY     PRIMARY     4   findata.fact_data7.entity_id    1   Using where

Answer 1

Without even looking at refactoring the query I would start by adding an index on the date columns fact_data7.data_collected and month_fill.date. 我什至不希望重构查询，而是从在日期列fact_data7.data_collected和month_fill.date上添加索引开始。 The range query ">" you are doing is slowing down the process and adding an index should theoretically speaking increase the performance but you need enough record otherwise managing the index will only slow down because of the processing involved in managing the index. 您正在执行的范围查询“>”会减慢该过程的速度，从理论上讲，添加索引应该可以提高性能，但是您需要足够的记录，否则管理索引只会因管理索引所涉及的处理而减慢速度。

See this mysql documentation http://dev.mysql.com/doc/refman/5.0/en/optimization-indexes.html 请参阅此mysql文档http://dev.mysql.com/doc/refman/5.0/en/optimization-indexes.html

I am not sure exactly what you are trying to achieve but you could try to do it using the ifnull(value1,value2) function of mysql. 我不确定您到底想实现什么，但是您可以尝试使用mysql的ifnull(value1,value2)函数来实现。 Your query could be somewhat like the following: 您的查询可能类似于以下内容：

select ifnull(date_format(fact_data7.date_collected,'%Y-%m'),date_format(month_fill.date,'%Y-%m')) as date_col, 
date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col, 
fact_data7.batch_id,
fact_data7.value as fdvalue,
entities.ticker as ticker,
date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2 ,
date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7 , month_fill
JOIN entities on fact_data7.entity_id=entities.id  
where ((entities.id= 963 AND fact_data7.metric_id=1) OR (entities.id is null and fact_data7.metric_id is null))
and date_format(fact_data7.date_collected,'%Y-%m') =  date_format(month_fill.date,'%Y-%m') --you will need a condition similar to this depends on the data
AND date_format(fact_data7.date_collected,'%Y-%m')>'2008-01-01'
order by date_col asc

Answer 2

I think it's worth trying to rewrite where in order not to use date_format(date_collected) . 我认为值得尝试重写where ，以便不使用date_format(date_collected) 。 You say you have an index on this field, but it's never used(field is an argument of the function, MySQL doesn't support function-based indexes) 您说您在此字段上有一个索引，但从未使用过（字段是该函数的参数，MySQL不支持基于函数的索引）

填补MySQL查询中时间序列空白的最佳方法

问题描述

2 个解决方案

解决方案1
0 已采纳 2011-11-14 19:03:58

解决方案2
0 2011-11-15 13:37:33

填补MySQL查询中时间序列空白的最佳方法

问题描述

2 个解决方案

解决方案1 0 已采纳 2011-11-14 19:03:58

解决方案2 0 2011-11-15 13:37:33

解决方案1
0 已采纳 2011-11-14 19:03:58

解决方案2
0 2011-11-15 13:37:33