[英]Best method to fill in gaps in a time series in a Mysql query
I need to fill in the gaps of a time series of a mysql query result set. 我需要填补mysql查询结果集的时间序列的空白。 I'm in the process of testing the option of doing an outer join with a helper table that contains all of the data points of the time series (as indicated in this thread: How to fill date gaps in MySQL? ).
我正在测试使用包含时间序列的所有数据点的帮助器表进行外部联接的选项的过程(如该线程所示: 如何填补MySQL中的日期空白? )。
The issue I'm running into is that adding this join significantly increases the query response time (it goes from sub 1 sec to 90 seconds). 我遇到的问题是,添加此联接会大大增加查询响应时间(从1秒到90秒)。
Here's the original query: 这是原始查询:
select date_format(fact_data7.date_collected,'%Y-%m') as date_col
, date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
, fact_data7.batch_id,fact_data7.value as fdvalue,entities.ticker as ticker
, date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
, date_format(fact_data7.date_collected,'%Y') as year
from fact_data7
JOIN entities on fact_data7.entity_id=entities.id
where (1=1)
AND ((entities.id= 963
AND fact_data7.metric_id=1
))
AND date_format(fact_data7.date_collected,'%Y-%m') > '2008-01-01'
order by date_col asc
and here is the query with the outer join to the helper table (month_fill) added: 这是添加了外部连接到帮助器表(month_fill)的查询:
select date_format(month_fill.date,'%Y-%m') as date_col
, date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
, fact_data7.batch_id,fact_data7.value as fdvalue
, entities.ticker as ticker
, date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
, date_format(fact_data7.date_collected,'%Y') as year
from fact_data7
JOIN entities
on fact_data7.entity_id=entities.id
RIGHT OUTER JOIN month_fill
on date_format(fact_data7.date_collected,'%Y-%m') = date_format(month_fill.date,'%Y-%m')
where (1=1)
AND (
(entities.id= 963 AND fact_data7.metric_id=1)
OR (entities.id is null and fact_data7.metric_id is null)
)
AND date_format(month_fill.date,'%Y-%m') > '2008-01-01'
order by date_col asc
Can I restructure the query to improve the performance is there an alternate solution to achieve what I'm looking for? 是否可以重组查询以提高性能,是否有替代解决方案可以实现所需的功能?
Update 11/15: 更新11/15:
Here's the EXPLAIN output for the 1st query: 这是第一个查询的EXPLAIN输出:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE entities const PRIMARY PRIMARY 4 const 1 Using filesort
1 SIMPLE fact_data7 ALL NULL NULL NULL NULL 230636 Using where
Here's the EXPLAIN output for the 2nd query: 这是第二个查询的EXPLAIN输出:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE month_fill index NULL date 8 NULL 204 Using where; Using index; Using temporary; Using filesort
1 SIMPLE fact_data7 ALL NULL NULL NULL NULL 230636 Using where
1 SIMPLE entities eq_ref PRIMARY PRIMARY 4 findata.fact_data7.entity_id 1 Using where
Without even looking at refactoring the query I would start by adding an index on the date columns fact_data7.data_collected and month_fill.date. 我什至不希望重构查询,而是从在日期列fact_data7.data_collected和month_fill.date上添加索引开始。 The range query ">" you are doing is slowing down the process and adding an index should theoretically speaking increase the performance but you need enough record otherwise managing the index will only slow down because of the processing involved in managing the index.
您正在执行的范围查询“>”会减慢该过程的速度,从理论上讲,添加索引应该可以提高性能,但是您需要足够的记录,否则管理索引只会因管理索引所涉及的处理而减慢速度。
See this mysql documentation http://dev.mysql.com/doc/refman/5.0/en/optimization-indexes.html 请参阅此mysql文档http://dev.mysql.com/doc/refman/5.0/en/optimization-indexes.html
I am not sure exactly what you are trying to achieve but you could try to do it using the ifnull(value1,value2)
function of mysql. 我不确定您到底想实现什么,但是您可以尝试使用mysql的
ifnull(value1,value2)
函数来实现。 Your query could be somewhat like the following: 您的查询可能类似于以下内容:
select ifnull(date_format(fact_data7.date_collected,'%Y-%m'),date_format(month_fill.date,'%Y-%m')) as date_col,
date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col,
fact_data7.batch_id,
fact_data7.value as fdvalue,
entities.ticker as ticker,
date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2 ,
date_format(fact_data7.date_collected,'%Y') as year
from fact_data7 , month_fill
JOIN entities on fact_data7.entity_id=entities.id
where ((entities.id= 963 AND fact_data7.metric_id=1) OR (entities.id is null and fact_data7.metric_id is null))
and date_format(fact_data7.date_collected,'%Y-%m') = date_format(month_fill.date,'%Y-%m') --you will need a condition similar to this depends on the data
AND date_format(fact_data7.date_collected,'%Y-%m')>'2008-01-01'
order by date_col asc
I think it's worth trying to rewrite where
in order not to use date_format(date_collected)
. 我认为值得尝试重写
where
,以便不使用date_format(date_collected)
。 You say you have an index on this field, but it's never used(field is an argument of the function, MySQL doesn't support function-based indexes) 您说您在此字段上有一个索引,但从未使用过(字段是该函数的参数,MySQL不支持基于函数的索引)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.