简体   繁体   English

使用MySQL按年和月分组,同时利用索引并避免临时/文件排序

[英]Grouping by year and month with MySQL while leveraging indexes and avoiding temporary/filesort

I have a large dataset (which is going to keep on growing!) where the data being read in bulk is stored with a DATE column, as all rows in any of the core data tables belong to a specific day (context is analytics/reporting). 我有一个很大的数据集(它将继续增长!),其中批量读取的数据存储在DATE列中,因为任何核心数据表中的所有行都属于特定的一天(上下文是分析/报告) )。

A lot of the views require data on a per month rather than per day detail level, and I'm aggregating the data as needed via SQL (SUM, AVG, etc). 许多视图需要每月而不是每天的详细信息级别的数据,我正在根据需要通过SQL(SUM,AVG等)聚合数据。

This also means I'm grouping data by YEAR() and MONTH() , which cannot use the index on the DATE column and results in a Use temporary and Use filesort from the query executor. 这也意味着我YEAR()MONTH()对数据进行分组,这不能使用DATE列上的索引,并导致查询执行程序产生Use temporaryUse filesort

Is the best solution here to split the DATE column into 3 separate columns for year, month and day? 这里是将DATE列分为年,月和日的3个单独列的最佳解决方案吗? Or to retain the DATE column (constraint, sorting, etc) and have a "yearmonth" (yyyymm) column which is also indexed? 还是保留DATE列(约束,排序等),并保留一个“ yearmonth”(yyyymm)列,该列也已建立索引? I don't like duplicating data but I'm just not 100% on what would be the best practice for this scenario. 我不喜欢复制数据,但是对于这种情况的最佳实践,我并不是100%。

I think the best way in terms of performance with GROUP -ing and SELECT -ing on month and date columns is to add a MONTH and YEAR column to the data. 我认为,在月和日期列上使用GROUP -ing和SELECT -ing的最佳方式是在数据中添加MONTHYEAR列。 The speed you gain by proper index usage will outnumber the pain of some more / duplicated data. 通过正确使用索引所获得的速度将超过更多/重复数据的痛苦。

Note that there is a YEAR datatype in MySQL. 注意,MySQL中有一个YEAR数据类型。

Make sure to use B-TREE indices on month and year column (not HASH ). 确保在monthyear列上使用B-TREE索引(而不是HASH )。

Do not split a DATE into component parts. 请勿将DATE分为多个组成部分。 The difficulties outweighs the presumed benefit. 困难大于预期的利益。

Use Summary Tables to avoid lengthy analytics/reporting. 使用摘要表可以避免冗长的分析/报告。 See my blog on such. 请参阅我的博客 Roughly speaking, every night you would calculate some subtotals and counts for the past day, and put these in a "Summary Table". 粗略地说,每天晚上您都将计算过去一天的一些小计和计数,并将它们放在“汇总表”中。 Analytics would run much faster against that table than against the "Fact" table. 与该表相比,针对该表的分析运行速度要快得多。

For AVG, be sure to store SUM() and COUNT(*), the compute (in the Report) SUM(sums) / SUM(counts) as Average . 对于AVG,请确保存储SUM()和COUNT(*),并将计算(在报告中)的SUM(sums) / SUM(counts) as Average

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM