简体   繁体   English

如何使用 PostgreSQL 9.2 计算游戏中每个级别的百分位数

[英]How to calculate percentiles for every level in a game using PostgreSQL 9.2

I have a table of game logs.我有一张游戏日志表。 Like this:像这样:


Level Shuffle_Count
  1        3
  2        1
  2        2
  2        1
  3        0
  3        4

That means whenever a user plays a level, a row is added to table.这意味着每当用户玩一个关卡时,都会向表中添加一行。 These rows have the level data showing which level was played by user and the shuffle_count data showing how many times shuffle happened during that level.这些行具有显示用户播放的级别的级别数据和显示该级别期间 shuffle 发生的次数的 shuffle_count 数据。

I want to know how many times shuffle occurs in every level by calculating the median of shuffle_count for every level.我想通过计算每个级别的 shuffle_count 的中位数来知道每个级别发生了多少次 shuffle。 In the below code, I can find the median of level 2 separately.在下面的代码中,我可以分别找到第 2 级的中位数。 Firstly, I create a temporary table which orders shuffle_counts and divide them to 4 even groups with ntile.首先,我创建了一个临时表,它对 shuffle_counts 进行排序,并将它们分成 4 个带有 ntile 的偶数组。 Then I select the min shuffle_count which has value of 3 within the new column named quartile.然后我在名为 quartile 的新列中选择值为 3 的 min shuffle_count。

with ranked_test as (
    SELECT shuffle_count, ntile(4) OVER (ORDER BY shuffle_count) AS quartile FROM ch.public.game_log WHERE level = 2
)
SELECT min(shuffle_count) FROM ranked_test
WHERE quartile = 3
GROUP BY quartile;

This is the table created before selecting min shuffle_count where quartile = 3 (which is median approximately):这是在选择 min shuffle_count 之前创建的表,其中四分位数 = 3(大约是中位数):

Shuffle_Count quartile
     0           1
     0           1
     2           2
     3           2
     4           3
     8           3
     12          4
     19          4

So far so good.到现在为止还挺好。 But the problem is that I have over 1000 levels and I can't do that manually for each level.但问题是我有 1000 多个级别,我无法为每个级别手动执行此操作。 I need the median value of shuffle_count for every level from 1 to 1000. I know this could be done with one row in PostgreSQL 9.4 but I unfortunately don't have that option right now.我需要从 1 到 1000 的每个级别的 shuffle_count 的中值。我知道这可以用 PostgreSQL 9.4 中的一行来完成,但不幸的是我现在没有那个选项。

I couldn't make this happen with a simple Group By.我无法通过简单的 Group By 实现这一点。 I guess I need more complex query including FOR or something.我想我需要更复杂的查询,包括 FOR 或其他东西。

Do you have any idea, guys?你有什么想法吗,伙计们? Thanks in advance.提前致谢。

I think that this should do it for your use case:我认为这应该适用于您的用例:

with ranked_test as (
    select 
        level,
        shuffle_count, 
        ntile(4) over(partition by level order by shuffle_count) quartile 
    from ch.public.game_log
)
select level, quartile , min(shuffle_count) 
from ranked_test
where quartile = 3
group by level, quartile;

This is basically an extended version of your working query:这基本上是您的工作查询的扩展版本:

  • in the CTE, we remove the filter on level in the subquery, and add it to the partition by of the window function instead在 CTE 中,我们删除了子查询中的level过滤器,并将其添加到窗口函数的partition by

  • the outer query, we add the level to the select and group by clause在外部查询中,我们将级别添加到selectgroup by子句中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM