简体   繁体   English

mySQL查询分组记录的前N个条目

[英]mySQL query for top N entries of a grouped records

I'm new to MySql and to databases in general. 我是MySql和一般数据库的新手。 I have a query, which I built together via snippets from online resources and trail and error. 我有一个查询,我通过在线资源和跟踪和错误的片段一起构建。 It is really slow (27sec) and I assume it can be optimized. 它真的很慢(27秒),我认为它可以优化。 Maybe someone could help me out with that. 也许有人可以帮我解决这个问题。

This is the datastructure for my mySQL - Database. 这是mySQL的数据结构 - 数据库。 Version 5.1.51-0 版本5.1.51-0

|- purchaseID -|- customerID -|- emotionID -|- customerCountryCode -|- customerContinentCode-|
|     1        |     2345     |     0       |        US             |            NA          |
|     2        |     2345     |     3       |        US             |            NA          |
|     3        |     4456     |     0       |        UK             |            EU          |
|     3        |     4456     |     5       |        UK             |            EU          |
|     4        |     4456     |     2       |        UK             |            EU          |
|     5        |     4456     |     2       |        UK             |            EU          |
|     6        |     1234     |     0       |        US             |            NA          |
|     7        |     6678     |     0       |        US             |            NA          |
|     8        |     9900     |     0       |        US             |            NA          |
|     9        |     3334     |     0       |        US             |            NA          |    
|     10       |     3334     |     4       |        US             |            NA          |

The database is used to save all the purchases, which are made. 该数据库用于保存所有已完成的购买。 For every purchase the customerID , the country and the continent he comes from are saved. 对于每次购买,他来自的customerID ,国家和大陆都会被保存。 The customer also has the possibility to rate his purchase from a set of 6 emotions. 顾客还可以从一组6种情绪中评价他的购买情况。 (happy, disappointed,...) The emotion he choses is saved as emotionID . (快乐,失望,......)他选择的情感被保存为情感emotionID

So now I need a query to get me the top 6 costumers for a certain emotionID with a percentage info. 所以现在我需要一个查询来获取具有百分比信息的特定emotionID客户。 Assume I looked for emotionID = 0 this is, what I would like to get: 假设我查找了emotionID = 0这是我想得到的:

|- customerID -|- emotionPercent -|
|     1234     |        100       |     
|     6678     |        100       |     
|     9900     |        100       | 
|     2345     |        50        |     
|     3334     |        50        | 
|     4456     |        25        |    

I'm using this query: 我正在使用此查询:

SELECT customers.customerID, Count( customers.emotionID ) / C.totalPeople * 100.0 AS emotionPercent 
FROM `customers` 
INNER JOIN 

    (SELECT customers.customerID, Count( customers.emotionID ) AS totalPeople
    FROM `customers` 
    GROUP BY customerID) C 

ON customers.customerID = C.customerID 
WHERE customers.emotionID = 0 
GROUP BY customers.customerID 
ORDER BY emotionPercent DESC 
LIMIT 0,6

I have searched for answers, but the additional percentage calculation is throwing me off. 我已经搜索了答案,但额外的百分比计算让我失望。 I have found some solutions, which would require to populate some sort of temporary table, but I couldn't get it to work. 我找到了一些解决方案,需要填充某种临时表,但我无法使其工作。

Problem is: Right now, there are 140,000 entries in the database and this query takes about 27 seconds. 问题是:现在,数据库中有140,000个条目,此查询大约需要27秒。 Can this be right? 这可能是对的吗? Would using a SQL - Server increase the speed significantly? 使用SQL - Server会显着提高速度吗?

What I don't get is this: Asking for the happiest country in the world is lightning fast (0.4 seconds), but structurally similar to the first query (27 sec): 我没有得到的是:要求世界上最幸福的国家快速闪电(0.4秒),但结构上与第一个查询类似(27秒):

SELECT customers.customerCountryCode, Count( customers.emotionID ) / C.totalPeople * 100.0 AS emotionPercent 
FROM `customers` 
INNER JOIN 

    (SELECT customers.customerCountryCode, Count( customers.emotionID ) AS totalPeople
    FROM `customers` 
    GROUP BY customerCountryCode) C 

ON customers.customerCountryCode = C.customerCountryCode 
WHERE customers.emotionID = 0 
GROUP BY customers.customerCountryCode 
ORDER BY emotionPercent DESC 
LIMIT 0,6

When I change the GROUP BY of the INNER Query in this example to customerID , the query also takes forever. 当我将此示例中的INNER QueryGROUP BY更改为customerID ,查询也将永远占用。 So it's the grouping by customerID that's causing the problem. 所以这是由customerID分组导致问题。 But why? 但为什么?

The customerCountryCode is defined as varchar(2) . customerCountryCode定义为varchar(2) The customerID is an int(11) . customerID是一个int(11) Is this causing the huge difference in the query performance? 这是否会导致查询性能的巨大差异? Is there some more appropiate varible type? 是否有更合适的变量类型? The customerID can have up to 8 numbers. customerID最多可包含8个号码。

A lot of questions! 很多问题! Thanks for reading and any help! 感谢阅读和任何帮助!

first off, if you think that the entries in your database will be ballooning, or if your entries are high and the server slow as it is, IMHO, you would want to preprocess the data and store it to another database with the summarized results, that way, you wouldn't have to request the same process over and over again. 首先,如果您认为数据库中的条目会膨胀,或者您的条目很高且服务器速度很慢,恕我直言,您可能希望预处理数据并将其存储到具有汇总结果的另一个数据库,这样,您就不必一遍又一遍地请求相同的过程。 Also, try using caching plugins for your app. 另外,尝试为您的应用使用缓存插件。 memcache for php or ehcache on j2ee would be safe bets. php或ehcache在j2ee上的memcache将是安全的赌注。

Your problem might be that you are using subqueries. 您的问题可能是您正在使用子查询。 Since subqueries don't use nor set indexes, they use the slowest join-method possible (ie a full table scan). 由于子查询不使用也不设置索引,因此它们使用最慢的连接方法(即全表扫描)。 I am not experienced enough to offer an SQL-only solution so I would recommend you break the query down into two separate calls. 我没有足够的经验来提供仅限SQL的解决方案,因此我建议您将查询分解为两个单独的调用。

  1. Get average emotion for each customer and select top 6, save into hash or object. 获得每个客户的平均情绪并选择前6个,保存为哈希或对象。
  2. Get those 6 customers via WHERE custumerID IN (id1, id2, id3, etc) 通过WHERE custumerID IN (id1, id2, id3, etc)获取这6个客户

Although this probably isn't the prettiest of solutions, you avoid using an index-less subquery (and the very slow full table scan). 虽然这可能不是最漂亮的解决方案,但您可以避免使用无索引子查询(以及非常慢的全表扫描)。

Thanks for your help! 谢谢你的帮助!

The guys from the mySQL forum suggested to add some indices: 来自mySQL论坛的人建议添加一些索引:

ALTER TABLE customers
  ADD KEY idx_country_emid (customerCountryCode, emotionID),
  ADD KEY idx_emid_custid (emotionID, customerID);

The query time dropped from 27 seconds to 0,1 seconds. 查询时间从27秒降至0.1秒。 ;) ;)

Also, for the inner query, you can write 此外,对于内部查询,您可以编写

(SELECT customers.customerCountryCode, Count( * ) AS totalPeople
    FROM `customers` 
    GROUP BY customerCountryCode) C 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM