MySQL GROUP BY optimization - InnoDB vs MyISAM?

Question

Question

Does anyone know why there is no GROUP BY optimization for MyISAM-based tables? (I'm using this version: 5.1.49-3)

Test tables

CREATE TABLE `_test2_innodb` (
    `i` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `n` smallint(5) unsigned NOT NULL,
    `t` int(10) unsigned NOT NULL,
    `v` smallint(6) NOT NULL,
    PRIMARY KEY (`i`),
    KEY `i_n` (`n`),
    KEY `i_t` (`t`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `_test2_myisam` (
    `i` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `n` smallint(5) unsigned NOT NULL,
    `t` int(10) unsigned NOT NULL,
    `v` smallint(6) NOT NULL,
    PRIMARY KEY (`i`),
    KEY `i_n` (`n`),
    KEY `i_t` (`t`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Test queries

SELECT MAX(i) FROM _test2_myisam GROUP BY n;

SELECT MAX(i) FROM _test2_innodb GROUP BY n;

Results

id, select_type, table, type, poss_keys, key, key_len, ref, rows, extra

1, SIMPLE, _test2_myisam , ALL, , , , , 19998, Using temporary; Using filesort

1, SIMPLE, _test2_innodb, index, , i_n, 2, , 20024, Using index

The problem is that if I'm using MyISAM, a full table scan will be done, which takes hours on large tables... And the MySQL documentation doesn't mention anything about table engines having different implementations ( http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html ). Does anyone know why is this handled differently internally?

(Note: no, it is not a good solution to switch to InnoDB) Thanks

Answer 1

The difference between the two tables, despite the seemingly identical definitions, is that MyISAM tables are "heaps" while InnoDB tables are clustered organized tables, eg the clustered index is the table (usually the primary key, the (i) index here).

The other difference, that causes the different execution plans is that in InnoDB, all non-clustered indexes (your (n) index in this case), include also the columns of the clustered index, the (i) in this case, so no full table scan is needed.

In other words, the (n) InnoDB index is roughly equivalent to a (n, PK) MyISAM index.

The MyISAM engine has to either do a full table scan or an index scan of the (n) index and then also scan the table (to get the values of the i column). So, it chooses the first plan (full scan + filesort to find the MAX values).

Do the test again, after adding an index on (n, i) in the MyISAM table:

ALTER TABLE _test2_myisam
    ADD INDEX n_i (n, i) ;

Answer 2

This is because InnoDB are stored in an index-organized way, based on the primary key. So the table scan and the primary key scan will be the same exact thing. Unfortunately for MyISAM, this is not the case, it will have to do a filesort.

MySQL GROUP BY optimization - InnoDB vs MyISAM?

Question

Question

Test tables

Test queries

Results

2 answers

solution1
3 ACCPTED 2012-10-31 11:44:39

solution2
0 2012-10-31 11:39:36

MySQL GROUP BY optimization - InnoDB vs MyISAM?

Question

Question

Test tables

Test queries

Results

2 answers

solution1 3 ACCPTED 2012-10-31 11:44:39

solution2 0 2012-10-31 11:39:36

solution1
3 ACCPTED 2012-10-31 11:44:39

solution2
0 2012-10-31 11:39:36