简体   繁体   English

mysql查询-优化现有的MAX-MIN查询以获取巨大的表格

[英]mysql query - optimizing existing MAX-MIN query for a huge table

I have a more or less good working query (concerning to the result) but it takes about 45seconds to be processed. 我有一个或多或少良好的工作查询(关于结果),但大约需要45秒的时间来处理。 That's definitely too long for presenting the data in a GUI. 对于在GUI中显示数据绝对太长了。
So my demand is to find a much faster/efficient query (something around a few milliseconds would be nice) My data table has something around 3000 ~2,619,395 entries and is still growing. 所以,我的要求是找一个更快/有效的查询(大约几毫秒的东西就好了)我的数据表中有大约 3000〜2619395 项的东西,还在不断增加。

Schema: 架构:

num | station | fetchDate             | exportValue | error
1   | PS1     | 2010-10-01 07:05:17   | 300         | 0
2   | PS2     | 2010-10-01 07:05:19   | 297         | 0
923 | PS1     | 2011-11-13 14:45:47   | 82771       | 0

Explanation 说明

  • the exportValue is always incrementing exportValue总是递增的
  • the exportValue represents the actual absolute value exportValue表示实际绝对值
  • in my case there are 10 stations 在我的情况下有10个车站
  • every ~15 minutes 10 new entries are written to the table 每〜15分钟将10个新条目写入表
  • error is just an indicator for a proper working station 错误只是一个正确的工作站的指标

Working query: 工作查询:

select
    YEAR(fetchDate), station, Max(exportValue)-MIN(exportValue)
from
    registros
where
    exportValue > 0 and error = 0 
group
    by station, YEAR(fetchDate)
order 
    by YEAR(fetchDate), station

Output: 输出:

Year | station | Max-Min
2008 | PS1     | 24012
2008 | PS2     | 23709
2009 | PS1     | 28102
2009 | PS2     | 25098

My thoughts on it: 我对此的看法:

  1. writing several queries with between statements like 'between 2008-01-01 and 2008-01-02' to fetch the MIN(exportValue) and between 2008-12-30 and 2008-12-31' to grab the MAX(exportValue) - Problem: a lot of queries and the problem with having no data in a specified time range (it's not guaranteed that there will be data) 使用介于'2008-01-01和2008-01-02之间的'之间的语句编写多个查询以获取MIN(exportValue)以及在2008-12-30和2008-12-31之间的语句之间以获取MAX(exportValue)-问题:很多查询以及在指定的时间范围内没有数据的问题(不能保证会有数据)
  2. limiting the resultset to my 10 stations only with using order by MIN(fetchDate) - problem: takes also a long time to process the query 仅通过使用MIN(fetchDate)的顺序将结果集限制为我的10个站-问题:处理查询也需要很长时间

Additional Info: 附加信息:
I'm using the query in a JAVA Application. 我在JAVA应用程序中使用查询。 That means, it would be possible to do some post-processing on the resultset if necessary. 这意味着,如有必要,可以对结果集进行一些后处理。 (JPA 2.0) (JPA 2.0)

Any help/approaches/ideas are very appreciated. 任何帮助/方法/想法都非常感谢。 Thanks in advance. 提前致谢。

Adding suitable indexes will help. 添加合适的索引会有所帮助。 2 compound indexes will speed things up significantly: 2个复合索引将显着加快处理速度:

ALTER TABLE tbl_name ADD INDEX (error, exportValue);
ALTER TABLE tbl_name ADD INDEX (station, fetchDate);

This query running on 3000 records should be extremely fast. 在3000条记录上运行的此查询应该非常快。

Suggestions: 意见建议:

  • do You have PK set on this table? 你在这张桌子上放了PK吗? station, fetchDate? 车站,fetchDate?
  • add indexes; 添加索引; You should experiment and try with indexes as rich.okelly suggested in his answer 您应该尝试并尝试使用他的答案中建议的rich.okelly索引
  • depending on experiments with indexes, try breaking your query into multiple statements - in one stored procedure; 根据索引的实验,尝试在一个存储过程中将查询分解为多个语句; this way You will not loose time in network traffic between multiple queries sent from client to mysql 这样,您将不会浪费时间从客户端发送到mysql的多个查询之间的网络流量
  • You mentioned that You tried with separate queries and there is a problem when there is no data for particular month; 您提到您尝试使用单独的查询,但是当特定月份没有数据时会出现问题。 it is regular case in business applications, You should handle it in a "master query" (stored procedure or application code) 这在业务应用程序中很常见,您应该在“主查询”(存储过程或应用程序代码)中进行处理
  • guess fetchDate is current date and time at the moment of record insertion; 猜测fetchDate是记录插入时的当前日期和时间; consider keeping previous months data in sort of summary table with fields: year, month, station, max(exportValue), min(exportValue) - this means that You should insert summary records in summary table at the end of each month; 考虑将前几个月的数据保留在具有以下字段的摘要表中:年,月,站,max(exportValue),min(exportValue)-这意味着您应该在每个月末在摘要表中插入摘要记录; deleting, keeping or moving detail records to separate table is your choice 选择将详细记录删除,保留或移动到单独的表

Since your table is rapidly growing (every 15 minutes) You should take the last suggestion into account. 由于您的桌子正在快速增长(每隔15分钟),因此您应该考虑最后一个建议。 Probably, there is no need to keep detailed history at one place. 可能不需要将详细的历史记录保存在一个地方。 Archiving data is process that should be done as part of maintenance. 归档数据是应在维护过程中完成的过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM