[英]How can I reuse the result of a sub-query in SELECT statement
我一直在处理大学课程的一些数据,我正在寻找优化我的查询。
我使用的数据集是英国国家警察关于拦截和搜查的数据,我正试图了解种族与他们得到的拦截和搜查份额之间的相关性。
我有一个查询,它将为每个警察部队和种族组合找到搜索总数、该种族与同一部队的其他种族相比的搜索百分比、全国平均百分比以及该部队平均值与全国平均水平之间的差异(无聊,我知道令人困惑)。
这是我当前的“有效”查询:
SELECT c1.FORCE,
c1.ETHNICITY,
(SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE AND ETHNICITY = c1.ETHNICITY) AS num_searches,
(ROUND(((SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE AND ETHNICITY = c1.ETHNICITY) /
(SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE)::DECIMAL), 4) * 100) AS percentage_of_force,
(SELECT ROUND((COUNT(*) / 303565::DECIMAL) * 100, 4) FROM CRIMES WHERE ETHNICITY = c1.ETHNICITY GROUP BY ETHNICITY) AS national_average,
(SELECT (ROUND(((SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE AND ETHNICITY = c1.ETHNICITY) /
(SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE)::DECIMAL), 4) * 100) - (SELECT ROUND((COUNT(*) / 303565::DECIMAL) * 100, 4) FROM CRIMES WHERE ETHNICITY = c1.ETHNICITY GROUP BY ETHNICITY)) AS difference_from_average
FROM (SELECT * FROM CRIMES) AS c1
GROUP BY c1.FORCE, c1.ETHNICITY
ORDER BY c1.FORCE, c1.ETHNICITY;
所以我的问题围绕着不止一次在“SELECT”部分重用相同的查询。
正如你可以从上面的查询看到difference_from_average
是只是结果percentage_of_force
减去national_average
但是我似乎无法找出一种方法来计算这些值一次,然后其他地方重用他们在SELECT
部分。 所以我的问题是我怎样才能做到这一点?
附加信息
示例输入数据
| date | ethnicity | force |
|------------|-----------|-----------------|
| 2018-01-01 | White | metropolitan |
| 2018-01-01 | White | west-yorkshire |
| 2018-01-01 | White | metropolitan |
| 2018-01-01 | White | metropolitan |
| 2018-01-01 | White | north-yorkshire |
| 2018-01-01 | White | west-yorkshire |
| 2018-01-01 | Black | metropolitan |
| 2018-01-01 | Undefined | metropolitan |
| 2018-01-01 | White | metropolitan |
| 2018-01-01 | White | metropolitan |
| 2018-01-01 | White | norfolk |
| 2018-01-01 | White | north-yorkshire |
| 2018-01-01 | White | northumbria |
| 2018-01-01 | White | west-yorkshire |
| 2018-01-01 | Black | metropolitan |
| 2018-01-01 | Black | metropolitan |
| 2018-01-01 | Black | metropolitan |
| 2018-01-01 | Black | metropolitan |
| 2018-01-01 | White | metropolitan |
| 2018-01-01 | Black | metropolitan |
示例查询结果
| force | ethnicity | num_searches | percentage_of_force | national_average | difference_from_average |
|-------------------|-----------|--------------|---------------------|------------------|-------------------------|
| avon-and-somerset | Asian | 41 | 2.88 | 13.0641 | -10.1841 |
| avon-and-somerset | Black | 223 | 15.64 | 25.6798 | -10.0398 |
| avon-and-somerset | Other | 66 | 4.63 | 2.7368 | 1.8932 |
| avon-and-somerset | Undefined | 184 | 12.9 | 7.4699 | 5.4301 |
| avon-and-somerset | White | 912 | 63.96 | 50.941 | 13.019 |
| bedfordshire | Asian | 440 | 23.31 | 13.0641 | 10.2459 |
| bedfordshire | Black | 373 | 19.76 | 25.6798 | -5.9198 |
| bedfordshire | Mixed | 2 | 0.11 | 0.1084 | 0.0016 |
| bedfordshire | Other | 33 | 1.75 | 2.7368 | -0.9868 |
| bedfordshire | Undefined | 97 | 5.14 | 7.4699 | -2.3299 |
| bedfordshire | White | 943 | 49.95 | 50.941 | -0.991 |
| btp | Asian | 301 | 7.14 | 13.0641 | -5.9241 |
| btp | Black | 1274 | 30.23 | 25.6798 | 4.5502 |
| btp | Other | 71 | 1.68 | 2.7368 | -1.0568 |
| btp | Undefined | 48 | 1.14 | 7.4699 | -6.3299 |
| btp | White | 2521 | 59.81 | 50.941 | 8.869 |
我正在使用 PostgreSQL v11.2。
有多种方法可以简化查询。 您可以使用一系列 CTE 来预先计算不同聚合级别的结果。 但我认为最有效和最易读的选择是使用窗口函数。
所有中间计数都可以在子查询中计算,使用COUNT(...) OVER(...)
和各种PARTITION BY
选项,如下所示:
SELECT
force,
ethnicity,
COUNT(*) OVER(PARTITION BY force, ethnicity) AS cnt,
COUNT(*) OVER(PARTITION BY force) AS cnt_force,
COUNT(*) OVER(PARTITION BY ethnicity) AS cnt_ethnicity,
ROW_NUMBER() OVER(PARTITION BY force, ethnicity) AS rn
FROM crimes
然后外部查询可以计算最终结果(同时过滤每个force
/ ethnicity
元组中的第一条记录以避免重复)。
询问 :
SELECT
force,
ethnicity,
cnt AS num_searches,
ROUND(cnt / cnt_force::decimal * 100, 4) AS percentage_of_force,
ROUND(cnt_ethnicity / 303565::decimal * 100, 4) AS national_average,
ROUND(cnt / cnt_force::decimal * 100, 4)
- ROUND(cnt_ethnicity / 303565::decimal * 100, 4) AS difference_from_average
FROM (
SELECT
force,
ethnicity,
COUNT(*) OVER(PARTITION BY force, ethnicity) AS cnt,
COUNT(*) OVER(PARTITION BY force) AS cnt_force,
COUNT(*) OVER(PARTITION BY ethnicity) AS cnt_ethnicity,
ROW_NUMBER() OVER(PARTITION BY force, ethnicity) AS rn
FROM crimes
) x
WHERE rn = 1
ORDER BY force, ethnicity;
| force | ethnicity | num_searches | percentage_of_force | national_average | difference_from_average |
| --------------- | --------- | ------------ | ------------------- | ---------------- | ----------------------- |
| metropolitan | Black | 6 | 46.1538 | 0.0020 | 46.1518 |
| metropolitan | Undefined | 1 | 7.6923 | 0.0003 | 7.6920 |
| metropolitan | White | 6 | 46.1538 | 0.0043 | 46.1495 |
| norfolk | White | 1 | 100.0000 | 0.0043 | 99.9957 |
| north-yorkshire | White | 2 | 100.0000 | 0.0043 | 99.9957 |
| northumbria | White | 1 | 100.0000 | 0.0043 | 99.9957 |
| west-yorkshire | White | 3 | 100.0000 | 0.0043 | 99.9957 |
诀窍是使用子选择:
SELECT f(a, b), a, c
FROM (SELECT g(c, d) AS a,
h(c) AS b,
c, d
FROM x) AS q;
你明白了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.