[英]MySQL: Difference outputs from group by and partition by with dense_rank() over()?
I was doing a MySQL question on Leetcode.(link:https://leetcode.com/problems/get-highest-answer-rate-question/ ) The question is to find the maximum.我在 Leetcode 上做一个 MySQL 问题。(链接:https://leetcode.com/problems/get-highest-answer-rate-question/ )问题是找到最大值。 I used order by + limit 1 to get the answer.
我使用order by + limit 1来得到答案。 But what if there are multiple maximum?
但是如果有多个最大值呢? Limit 1 will only return 1 answer.
限制 1只会返回 1 个答案。
I tried to use dense_rank() to solve the problem, but I found the outputs are different when I use partition by and group by.我尝试使用dense_rank() 来解决这个问题,但是当我使用partition by 和group by 时,我发现输出是不同的。
Input: {"headers": {"survey_log": ["id", "action", "question_id", "answer_id", "q_num", "timestamp"]},"rows": {"survey_log": [[5, "show", 285, null, 1, 123], [5, "answer", 285, 124124, 1, 124], [5, "show", 369, null, 2, 125], [5, "skip", 369, null, 2, 126]]}}
If my code is:如果我的代码是:
# Method 1
select question_id,
dense_rank() over (order by count(case when action = 'answer' then 1 end)/
count(case when action = 'show' then 1 end) desc) as num
from survey_log
group by question_id
Then I got output:然后我得到了 output:
Output: {"headers": ["question_id", "num"], "values": [[285, 1], [369, 2]]}
However, when I tried to use partition by to achieve the same effects, the output is not what I want:但是,当我尝试使用 partition by 来实现相同的效果时,output 不是我想要的:
# Method 2
select question_id,
dense_rank() over (partition by question_id
order by count(case when action = 'answer' then 1 end)/
count(case when action = 'show' then 1 end) desc) as num
from survey_log
Output: {"headers": ["question_id", "num"], "values": [[285, 1]]}
I don't know why the outputs here are different.我不知道为什么这里的输出不同。 Can anyone explain?
谁能解释一下? Thanks in advance!!
提前致谢!!
Update: I'm sorry that I didn't state the question clearly.更新:很抱歉,我没有清楚地回答问题。 The question is to "write a sql query to identify the question which has the highest answer rate."
问题是“编写一个 sql 查询来确定回答率最高的问题”。
"The highest answer rate meaning is: answer number's ratio in show number in the same question." “最高答题率的意思是:同一题中答题数占节目数的比例。”
As for the input above, question 285 has answer rate 1/1, while question 369 has 0/1 answer rate, so output 285. Then the output should be: 285 Output对于上面的输入,第285题的回答率为1/1,而第369题的回答率为0/1,所以output 285。那么output应该是:285 Z29A47C02A36DZCD247
My confusion is why the output of method 2 is different from method 1?我的困惑是为什么方法2的output与方法1不同? Thanks!!
谢谢!!
I would start with a query that computes the answer rate for each question.我将从计算每个问题的回答率的查询开始。 From your problem statement, that should be:
根据您的问题陈述,应该是:
select
question_id,
sum(action = 'answer') / nullif(sum(action = 'show'), 0) answer_rate
from survey_log
group by question_id
You can use that information to rank the questions.您可以使用该信息对问题进行排名。 You want to rank each question against all other groups, so the window function should not have a
partition
clause:您希望针对所有其他组对每个问题进行排名,因此 window function 不应有
partition
子句:
select
question_id,
rank() over(order by sum(action = 'answer') / nullif(sum(action = 'show'), 0) desc) rn
from survey_log
group by question_id
order by rn
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.