简体   繁体   中英

MySQL Query performance issue using group by

I have the following SQL. It's taking about 95 seconds to execute. There are approx 25 million records in the table.

SET @lat=(select latitude from skoovy_prd.pins where user_id=0 and board_id=0 limit 1);
SET @lng=(select longitude from skoovy_prd.pins where user_id=0 and board_id=0 limit 1);
SELECT category_id, MAX(pin_id), pin_id
FROM skoovy_prd.pins
WHERE ( 3959 * acos( cos( radians(@lat) ) * cos( radians( latitude ) ) 
* cos( radians( longitude ) - radians(@lng) ) + sin( radians(@lat) ) * sin(radians(latitude)) ) ) <=25
GROUP BY category_id DESC
LIMIT 12;

category_id, latitude, longitude, pin_id are all BTREE indexes.

Is there a more efficient way to write this so I can get records back much faster? The purpose of this is to get me a record set of data where each record is a distinct category. I got the sql here after posting this question: mysql selecting records but ensuring data in one column is distinct of which it was marked as a duplicate of Retrieving the last record in each group

There was a solution provided by newtlover in the list of answers which led me to the sql I have written and posted here. (Even though I'm not really looking for the last record in each group, it's at least getting me records where the category_id is distinct in the recordset.

I'm hoping there's a way to improve performance on this query. And if anyone has any suggestions to get around the whole last record in each group, that'd also be appreciated. I am NOT a SQL person by any means, so I'm grasping at straws here.

You can't expect an SQL expression to make use of an index if you referenced the indexed columns deeply within expressions. That spoils the use of indexes, because the optimizer has no way of knowing if the result of the expression has the same sort order as the order of the index.

Distance formulas are especially difficult to optimize with B-trees, because the B-tree is sorted primarily along one axis.

The point being that your WHERE clause has to evaluate the expensive trig functions on all 25 million rows, instead of being able to reduce the result set by using the index.

One solution is to use bounding boxes to reduce the scope of the search. That is, if you know @lat , then you could use WHERE latitude BETWEEN @lat-25 AND @lat+25 AND ...trig expression... Because AND only evaluates the right operand if the left operand is true, this would help by reducing the possible matches more efficiently.

Unfortunately you can't use a single B-tree lookup to filter on both latitude and longitude simultaneously, even if you use a compound index. Think about this: I ask you to look up names in a phone book, for anyone whose last name begins with "S" and whose first name beings with "J". The phone book is like an index on lastname, firstname, but the firstnames are not sorted together. You end up having to search all the "S" lastnames, as if you had only that column indexed.

There are other technologies besides B-trees, that make it easier to do these kinds of multidimensional searches. One is Sphinx Search. See An introduction to distance-based searching in Sphinx .

Another is to use some of the builtin features of MySQL 5.6, but it will be indexed only if you store data in MyISAM (which I usually recommend against using ).

See Alexander Rubin's excellent resources on geospatial searches in MySQL:

the math is causing a full table scan each time. If you have the possibilty to store its result eg. per cronjob than you should do it. another way will be to add some other indexed condition before the math to reduce the number of the examinated rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM