简体   繁体   English

MySQL百分位排名

[英]mysql percentile rank by group

I have a table containing date, id, and value, with about 1000 id rows per date. 我有一个包含日期,ID和值的表,每个日期约有1000个ID行。 I need to calculate the percentile rank of each row, by date. 我需要按日期计算每一行的百分位等级。 I am using the following code for percentile rank for a single date, but with over 10 years of daily data this is very inefficient to run date-by-date. 我将以下代码用于单个日期的百分位排名,但由于每天有10多年的数据,因此按日期运行效率非常低。 Seems that it should be able to be formulated in MySQL but I've not been able to make it work. 似乎它应该可以用MySQL编写,但是我无法使其工作。

Date   ID    Value
date1  01    -7.2
date1  02     0.6
date2  01     1.2
date2  02     3.8

SELECT c.id, c.value, ROUND( (
(@rank - rank) / @rank ) *100, 2) AS rank
FROM (
SELECT * , @prev := @curr , @curr := a.value, 
@nxtRnk := @nxtRnk + 1,
@rank := IF( @prev = @curr , @rank , @nxtRnk ) AS rank
FROM (
SELECT id, value
FROM temp
WHERE date = '2013-06-28'
) AS a, (

SELECT @curr := NULL , @prev := NULL , @rank :=0, @nxtRnk :=0
) AS b
ORDER BY value DESC
) AS c

So basically I want to SELECT DISTINCT(date), and then for each date perform the above SELECT, which is preceeded by INSERT INTO table2( ... ) to write the results to table2. 因此,基本上我想选择DISTINCT(date),然后对每个日期执行上面的SELECT,然后在INSERT INTO table2(...)之前将结果写入table2。

Thanks for any help, Hugh 谢谢您的帮助,休

I finally developed an acceptable solution by using a temporary table. 我终于通过使用临时表开发了一个可接受的解决方案。 Maybe not the optimum solution, but it works in about 5 sec on a million + record table. 也许不是最佳的解决方案,但是在百万张以上的记录表上,它可以在大约5秒钟内起作用。

My temporary table (t1) contains date and the count of rows for date. 我的临时表(t1)包含日期和该日期的行数。

The third select above is changed to SELECT t1.date, t1.cnt, id, value FROM t1 LEFT JOIN temp ON(t1.date = temp.date) 上面的第三个选择更改为SELECT t1.date,t1.cnt,id,值FROM t1左联接temp ON(t1.date = temp.date)

Also, the calculations in the first SELECT above were changed to use c.cnt rather than @rank, and an @prevDate variable was created to reset the rank count on date changes. 另外,上面第一个SELECT中的计算已更改为使用c.cnt而不是@rank,并且创建了一个@prevDate变量以重置日期更改时的排名计数。

Thanks to anyone who looked at this and tried to work up a solution. 感谢所有看过这个并尝试制定解决方案的人。

I was trying to solve this for quite some time and then I found the following answer. 我试图解决这个问题已经有一段时间了,然后我找到了以下答案。 Honestly brilliant. 老实说辉煌。 Also quite fast even for big tables (the table where I used it contained approx 5 mil records and needed a couple of seconds). 即使对于大型表也非常快(我使用它的表包含大约500万条记录,需要几秒钟)。

SELECT 
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY 
    field_name SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS 95th Per 
FROM table_name;

As you can imagine just replace table_name and field_name with your table's and column's names. 可以想象,只需将table_name和field_name替换为表和列的名称即可。

For further information check Roland Bouman 's original post 有关更多信息,请查看Roland Bouman的原始帖子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM