简体   繁体   中英

MySQL database row COUNT optimization

I have a MySQL (5.6.26) database with large ammount of data and I have problem with COUNT select on table join.

This query takes about 23 seconds to execute:

SELECT COUNT(0) FROM user
LEFT JOIN blog_user ON blog_user.id_user = user.id
WHERE email IS NOT NULL
AND blog_user.id_blog = 1

在此处输入图片说明

Table user is MyISAM and contains user data like id, email, name, etc...

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `username` varchar(50) DEFAULT NULL,
  `email` varchar(100) DEFAULT '',
  `hash` varchar(100) DEFAULT NULL,
  `last_login` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`) USING BTREE,
  UNIQUE KEY `email` (`email`) USING BTREE,
  UNIQUE KEY `hash` (`hash`) USING BTREE,
  FULLTEXT KEY `email_full_text` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=5728203 DEFAULT CHARSET=utf8

在此处输入图片说明

Table blog_user is InnoDB and contains only id, id_user and id_blog (user can have access to more than one blog). id is PRIMARY KEY and there are indexes on id_blog, id_user and id_blog-id_user.

CREATE TABLE `blog_user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `id_blog` int(11) NOT NULL DEFAULT '0',
  `id_user` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `id_blog_user` (`id_blog`,`id_user`) USING BTREE,
  KEY `id_user` (`id_user`) USING BTREE,
  KEY `id_blog` (`id_blog`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5250695 DEFAULT CHARSET=utf8

在此处输入图片说明

I deleted all other tables and there is no other connection to MySQL server (testing environment).

What I've found so far:

  1. When I delete some columns from user table, duration of query is shorter (like 2 seconds per deleted column)
  2. When I delete all columns from user table (except id and email), duration of query is 0.6 seconds.
  3. When I change blog_user table also to MyISAM, duration of query is 46 seconds.
  4. When I change user table to InnoDB, duration of query is 0.1 seconds.

The question is why is MyISAM so slow executing the command?

First, some comments on your query (after fixing it up a bit):

SELECT COUNT(*)
FROM user u LEFT JOIN
     blog_user bu
     ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;

Table aliases help make it easier to both write and to read a query. More importantly, You have a LEFT JOIN but your WHERE clause is turning it into an INNER JOIN . So, write it that way:

SELECT COUNT(*)
FROM user u INNER JOIN
     blog_user bu
    ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;

The difference is important because it affects choices that the optimizer can make.

Next, indexes will help this query. I am guessing that blog_user(id_blog, id_user) and user(id, email) are the best indexes.

The reason why the number of columns affects your original query is because it is doing a lot of I/O. The fewer columns then the fewer pages needed to store the records -- and the faster the query runs. Proper indexes should work better and more consistently.

To answer the real question (why is myisam slower than InnoDB), I can't give an authoritative answer.

But it is certainly related to one of the more important differences between the two storage engines : InnoDB does support foreign keys, and myisam doesn't. Foreign keys are important for joining tables.

I don't know if defining a foreign key constraint will improve speed further, but for sure, it will guarantee data consistency.

Another note : you observe that the time decreases as you delete columns. This indicates that the query requires a full table scan. This can be avoided by creating an index on the email column. user.id and blog.id_user hopefully already have an index, if they don't, this is an error. Columns that participate in a foreign key, explicit or not, always must have an index.

This is a long time after the event to be much use to the OP and all the foregoing suggestions for speeding up the query are entirely appropriate but I wonder why no one has remarked on the output of EXPLAIN. Specifically, why the index on email was chosen and how that relates to the definition for the email column in the user table.

The optimizer has selected an index on email column, presumably because it's included in the where clause. key_len for this index is comparatively long and it's a reasonably large table given the auto_increment value so the memory requirements for this index would be considerably greater than if it had chosen the id column (4 bytes against 303 bytes). The email column is NULLABLE but has a default of the empty string so, unless the application explicitly sets a NULL, you are not going to find any NULLs in this column anyway. Neither will you find more than one record with the default given the UNIQUE constraint. The column DEFAULT and UNIQUE constraint appear to be completely at odds with each other.

Given the above, and the fact we only want the count in the query, I'd then wonder if the email part of the where clause serves any purpose other than slowing the query down as each value is compared to NULL. Without it the optimizer would probably pick the primary key and do a much better job. Better yet would be a query which ignored the user table entirely and took the count based on the covering index on blog_user that Gordon Linoff highlighted.

There's another indexing issues here worth mentioning:

On the user table

 UNIQUE KEY `id` (`id`) USING BTREE,

is redundant since id is the PRIMARY KEY and therefore UNIQUE by definition.

To answer your last question, The question is why is MyISAM so slow executing the command? MyISAM is dependent on the speed of your hard drive, INNODB once the data is read is at speed of RAM. 1st time query is run could be loading data, second and later will avoid hard drive until aged out of RAM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM