简体   繁体   中英

Performance and sorting, and distinct unique between mysql and php

In situations like this which method or mix of methods performs the quickest?

$year = db_get_fields("select distinct year from car_cache order by year desc");

Or

$year = db_get_fields("select year from car_cache");
$year = array_unique($year);
sort($year);

I've heard the distinct on mysql is a real big performance hit for large queries and this table can have a million rows or more. I wondered what combination of database types, Innodb or MyISAM, would work best too. I know many optimizations are very query dependent. Year is an unsigned number, but other fields are varchar of different lengths I know that may make a difference too. Such as:

$line = db_get_fields("select distinct line from car_cache where year='$postyear' and make='$postmake' order by line desc");

I read that using the new innodb multiple keys method can make queries like this one very very quick. But the distinct and order by clauses are red flags to me.

Have MySQL do as much work as possible. If it isn't being efficient at what its doing, then things likely aren't set up correctly (whether it is proper indexing for the query you are trying to run, or settings with sort buffers).

If you have an index on the year column, then using DISTINCT should be efficient. If you do not, then a full table scan is necessary in order to fetch the distinct rows. If you try to sort out the distinct rows in PHP rather than MySQL, then you transmit (potentially) much more data from MySQL to PHP, and PHP consumes much more memory to store all that data before eliminating the duplicates.

Here is some sample output from a dev database I have. Also note that this database is on a different server on the network from where the queries are being executed.

SELECT COUNT(SerialNumber) FROM `readings`;
> 97698592

SELECT SQL_NO_CACHE DISTINCT `SerialNumber`
FROM `readings`
ORDER BY `SerialNumber` DESC
LIMIT 10000;
> Fetched 10000 records.  Duration: 0.801 sec, fetched in: 0.082 sec

> EXPLAIN *above_query*
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra                                                     |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | readings | range | NULL          | PRIMARY | 18      | NULL |   19 | Using index for group-by; Using temporary; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+

If I attempt the same query, except replace the SerialNumber column with one that is non-indexed, then it takes forever to run because MySQL has to examine all 97 million rows.

Some of the efficiency has to do with how much data you expect to get back. If I slightly modify the above queries to operate on the time column (the timestamp of the reading), then it takes 1 min 40 seconds to get a distinct list of 273,505 times, most of the overhead there is in transferring all the records over the network. So keep in mind the limits on how much data you are getting back, you want to keep that as low as possible for the data you are trying to fetch.

As for your final query:

select distinct line from car_cache
where year='$postyear' and make='$postmake'
order by line desc

There should be no problem with that either, just make sure you have a compound index on year and make and possibly an index on line .

On a final note, the engine I am using for the readings table is InnoDB, and my server is: 5.5.23-55-log Percona Server (GPL), Release 25.3 which is a version of MySQL by Percona Inc.

Hope that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM