简体   繁体   English

如何使用列值范围应用 NTILE(4)?

[英]How to apply NTILE(4) using range of column values?

Would like to use NTILE to see the distribution of countries by forested land percent of total land area.想用NTILE查看各国的森林分布占土地总面积的百分比。 The range of values in the column I'd like to use is from 0.00053 to very close to 98.25, and countries are not evenly distributed across the quartiles implied by that range, ie 0 to 25, 25 to 50, 50 to 75, and 75 to 100 approximately.我想使用的列中的值范围是从 0.00053 到非常接近 98.25,并且国家在该范围所暗示的四分位数中分布不均匀,即 0 到 25、25 到 50、50 到 75,以及大约 75 到 100。 Instead, NTILE is just dividing the table into four groups with the same number of rows.相反, NTILE只是将表分成具有相同行数的四组。 How do I use NTILE to assign quantiles based on values?如何使用NTILE根据值分配分位数?

SELECT country, forest, pcnt_forest,
       NTILE(4) OVER(ORDER BY pcnt_forest) AS quartile
FROM percent_forest

WIDTH_BUCKET function matches this scenario ideally: WIDTH_BUCKET function 非常适合这种情况:

WIDTH_BUCKET(Oracle) lets you construct equiwidth histograms , in which the histogram range is divided into intervals that have identical size. WIDTH_BUCKET(Oracle)允许您构造等宽直方图,其中直方图范围被划分为具有相同大小的区间。 (Compare this function with NTILE, which creates equiheight histograms.) (将此 function 与创建等高直方图的 NTILE 进行比较。)

It is supported by Oracle, Snowflake, PostgreSQL,...它受 Oracle、雪花、PostgreSQL、...

Your code:你的代码:

SELECT country,  pcnt_forest
       ,WIDTH_BUCKET(pcnt_forest, 0, 1, 4) AS w
       ,NTILE(4) OVER(ORDER BY pcnt_forest) AS ntile  -- for comparison
FROM percent_forest
ORDER BY w

db<>fiddle demo db<>小提琴演示

Output: Output:

+----------+--------------+----+-------+
| COUNTRY  | PCNT_FOREST  | W  | NTILE |
+----------+--------------+----+-------+
| A        |         .05  | 1  |     1 |
| B        |         .06  | 1  |     1 |
| C        |         .07  | 1  |     2 |
| E        |         .49  | 2  |     2 |
| D        |         .51  | 3  |     3 |
| F        |         .96  | 4  |     3 |
| G        |         .97  | 4  |     4 |
| H        |         .98  | 4  |     4 |
+----------+--------------+----+-------+

You can use a case expression:您可以使用case表达式:

select pf.*,
       (case when pcnt_forest < 0.25 then 1
             when pcnt_forest < 0.50 then 2
             when pcnt_forest < 0.75 then 3
             else 4
        end) as bin
from percent_forest pf;

Or, even simpler, use arithmetic:或者,更简单,使用算术:

select pf.*,
       floor(pcnt_forest * 4) + 1 bin
from percent_forest pf;

I would not use the term "quartile" for this column.我不会在本专栏中使用“四分位数”一词。 A quartile implies four equal sized bins (or at least as close as possible given duplicate values).四分位数意味着四个大小相等的箱(或至少在给定重复值的情况下尽可能接近)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM