简体   繁体   English

Mysql GROUP BY使用Filesort-查询优化

[英]MySql GROUP BY using filesort - query optimization

I have a table like this: 我有一张这样的桌子:

CREATE TABLE `purchase` (
  `fact_purchase_id` binary(16) NOT NULL,
  `purchase_id` int(10) unsigned NOT NULL,
  `purchase_id_primary` int(10) unsigned DEFAULT NULL,
  `person_id` int(10) unsigned NOT NULL,
  `person_id_owner` int(10) unsigned NOT NULL,
  `service_id` int(10) unsigned NOT NULL,
  `fact_count` int(10) unsigned NOT NULL DEFAULT '0',
  `fact_type` tinyint(3) unsigned NOT NULL,
  `date_fact` date NOT NULL,
  `purchase_name` varchar(255) DEFAULT NULL,
  `activation_price` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `activation_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `renew_price` decimal(7,2) unsigned DEFAULT '0.00',
  `renew_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `activation_cost` decimal(7,2) unsigned DEFAULT '0.00',
  `activation_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `renew_cost` decimal(7,2) unsigned DEFAULT '0.00',
  `renew_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
  `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`fact_purchase_id`),
  KEY `purchase_id_idx` (`purchase_id`),
  KEY `person_id_idx` (`person_id`),
  KEY `person_id_owner_idx` (`person_id_owner`),
  KEY `service_id_idx` (`service_id`),
  KEY `fact_type_idx` (`fact_type`),
  KEY `renew_price_idx` (`renew_price`),
  KEY `renew_cost_idx` (`renew_cost`),
  KEY `renew_price_year_idx` (`renew_price_year`),
  KEY `renew_cost_year_idx` (`renew_cost_year`),
  KEY `date_created_idx` (`date_created`),
  KEY `purchase_id_primary_idx` (`purchase_id_primary`),
  KEY `fact_count` (`fact_count`),
  KEY `renew_price_year_total_idx` (`renew_price_total`),
  KEY `renew_cost_year_total_idx` (`renew_cost_total`),
  KEY `date_fact` (`date_fact`) USING BTREE,
  CONSTRAINT `purchase_person_fk` FOREIGN KEY (`person_id`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `purchase_person_owner_fk` FOREIGN KEY (`person_id_owner`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `purchase_service_fk` FOREIGN KEY (`service_id`) REFERENCES `service` (`service_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

I'm launching this query: 我正在启动此查询:

SELECT 
    purchase.date_fact,
    UNIX_TIMESTAMP(purchase.date_fact),
    COUNT(DISTINCT purchase.purchase_id) AS Num
FROM
    purchase
WHERE
    purchase.date_fact >= '2017-01-01'
    AND purchase.date_fact <= '2017-01-31'
    AND purchase.fact_type = 3
    AND purchase.purchase_id_primary IS NULL
GROUP BY purchase.date_fact

The table contains a total of 5.629.670 records and running an EXPLAIN on the query I get these results: 该表总共包含5.629.670条记录,并对查询运行EXPLAIN会得到以下结果:

  • rows = 2.814.835 rows = 2.814.835
  • possible_keys = fact_type_idx,purchase_id_primary_idx,date_fact possible_keys = fact_type_idx,purchase_id_primary_idx,date_fact
  • key = fact_type_idx key = fact_type_idx
  • key_len = 1 key_len = 1
  • ref = const ref = const
  • filtered = 25.00 filtered = 25.00
  • Extra = Using index condition;Using where;Using filesort Extra = Using index condition;Using where;Using filesort

The query takes 30-35 seconds to be executed. 查询需要30-35秒才能执行。 This is too long to wait. 等待时间太长。

The problem is that the GROUP BY causes filesort to be applied. 问题是GROUP BY导致要应用文件排序。 Applying ORDER BY NULL to the query doesn't change anything . ORDER BY NULL应用于查询不会改变任何东西

I could possibly use a covering index, but I just need date_fact in this query: which fields can I use? 我可以使用覆盖索引,但是在此查询中我只需要使用date_fact:我可以使用哪些字段?

How can I avoid filesort on GROUP BY ? 如何避免对GROUP BY排序? How can I optimize the query in order to make it faster? 如何优化查询以使其更快?

I'm using this table for statistics purposes (OLAP). 我将此表用于统计目的(OLAP)。 Maybe is there any better DBMS for this purpose? 也许为此目的有更好的DBMS吗?

I'm running MySql Server 5.7.17. 我正在运行MySql Server 5.7.17。

Thank you 谢谢

For this query: 对于此查询:

SELECT p.date_fact, UNIX_TIMESTAMP(p.date_fact),
       COUNT(DISTINCT p.purchase_id) AS Num
FROM purchase p
WHERE p.date_fact >= '2017-01-01' AND
      p.date_fact <= '2017-01-31' AND
      p.fact_type = 3 AND
      p.purchase_id_primary IS NULL
GROUP BY p.date_fact;

I would recommend a compound index on (fact_type, purchase_id_primary, date_fact, purchase_id) . 我建议在(fact_type, purchase_id_primary, date_fact, purchase_id)上使用复合索引。 The first two keys have equality conditions in the WHERE . 前两个键在WHERE具有相等条件。 The third has an inequality, and the fourth allows the index to "cover" the query (all columns in the query are in the index). 第三个具有不等式,第四个允许索引“覆盖”查询(查询中的所有列都在索引中)。

I would also add: if you don't need COUNT(DISTINCT) , then don't use it. 我还要补充一句:如果您不需要COUNT(DISTINCT) ,请不要使用它。 purchase_id might already be unique in purchase . purchase_id可能已经是唯一的purchase

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM