简体   繁体   English

MySQL:使用dense_rank()over()从group by和partition by的差异输出?

[英]MySQL: Difference outputs from group by and partition by with dense_rank() over()?

I was doing a MySQL question on Leetcode.(link:https://leetcode.com/problems/get-highest-answer-rate-question/ ) The question is to find the maximum.我在 Leetcode 上做一个 MySQL 问题。(链接:https://leetcode.com/problems/get-highest-answer-rate-question/ )问题是找到最大值。 I used order by + limit 1 to get the answer.我使用order by + limit 1来得到答案。 But what if there are multiple maximum?但是如果有多个最大值呢? Limit 1 will only return 1 answer.限制 1只会返回 1 个答案。

I tried to use dense_rank() to solve the problem, but I found the outputs are different when I use partition by and group by.我尝试使用dense_rank() 来解决这个问题,但是当我使用partition by 和group by 时,我发现输出是不同的。

Input: {"headers": {"survey_log": ["id", "action", "question_id", "answer_id", "q_num", "timestamp"]},"rows": {"survey_log": [[5, "show", 285, null, 1, 123], [5, "answer", 285, 124124, 1, 124], [5, "show", 369, null, 2, 125], [5, "skip", 369, null, 2, 126]]}}

输入图像

If my code is:如果我的代码是:

# Method 1
select question_id, 
dense_rank() over (order by count(case when action = 'answer' then 1 end)/
                            count(case when action = 'show' then 1 end) desc) as num
from survey_log
group by question_id

Then I got output:然后我得到了 output:

Output: {"headers": ["question_id", "num"], "values": [[285, 1], [369, 2]]}

However, when I tried to use partition by to achieve the same effects, the output is not what I want:但是,当我尝试使用 partition by 来实现相同的效果时,output 不是我想要的:

# Method 2
select question_id, 
dense_rank() over (partition by question_id 
                   order by count(case when action = 'answer' then 1 end)/
                            count(case when action = 'show' then 1 end) desc) as num
from survey_log
Output: {"headers": ["question_id", "num"], "values": [[285, 1]]}

I don't know why the outputs here are different.我不知道为什么这里的输出不同。 Can anyone explain?谁能解释一下? Thanks in advance!!提前致谢!!


Update: I'm sorry that I didn't state the question clearly.更新:很抱歉,我没有清楚地回答问题。 The question is to "write a sql query to identify the question which has the highest answer rate."问题是“编写一个 sql 查询来确定回答率最高的问题”。

"The highest answer rate meaning is: answer number's ratio in show number in the same question." “最高答题率的意思是:同一题中答题数占节目数的比例。”

As for the input above, question 285 has answer rate 1/1, while question 369 has 0/1 answer rate, so output 285. Then the output should be: 285 Output对于上面的输入,第285题的回答率为1/1,而第369题的回答率为0/1,所以output 285。那么output应该是:285 Z29A47C02A36DZCD247

My confusion is why the output of method 2 is different from method 1?我的困惑是为什么方法2的output与方法1不同? Thanks!!谢谢!!

I would start with a query that computes the answer rate for each question.我将从计算每个问题的回答率的查询开始。 From your problem statement, that should be:根据您的问题陈述,应该是:

select
    question_id,
    sum(action = 'answer') / nullif(sum(action = 'show'), 0) answer_rate
from survey_log
group by question_id

You can use that information to rank the questions.您可以使用该信息对问题进行排名。 You want to rank each question against all other groups, so the window function should not have a partition clause:您希望针对所有其他组对每个问题进行排名,因此 window function 不应有partition子句:

select
    question_id,
    rank() over(order by sum(action = 'answer') / nullif(sum(action = 'show'), 0) desc) rn
from survey_log
group by question_id
order by rn

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM