简体   繁体   中英

How can I reuse the result of a sub-query in SELECT statement

I've been working on crunching some data for piece of University coursework and I'm looking to optimise my query.

The dataset I'm using is the UK national police data on stop and searches and I'm trying to get the correlations between ethnicity and the share of stop and searches they get.

I have a query which will for each police force and ethnicity combination find the total number of searches, the percentage of searches on that ethnicity compared to others by the same force, the national average percentage and the difference between that force average and the national average (boring an confusing I know).

This is my current query which 'works':

SELECT c1.FORCE,
       c1.ETHNICITY,
       (SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE AND ETHNICITY = c1.ETHNICITY) AS num_searches,
       (ROUND(((SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE AND ETHNICITY = c1.ETHNICITY) /
           (SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE)::DECIMAL), 4) * 100) AS percentage_of_force,
       (SELECT ROUND((COUNT(*) / 303565::DECIMAL) * 100, 4) FROM CRIMES WHERE ETHNICITY = c1.ETHNICITY GROUP BY ETHNICITY) AS national_average,
       (SELECT (ROUND(((SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE AND ETHNICITY = c1.ETHNICITY) /
           (SELECT COUNT(*) FROM CRIMES WHERE FORCE = c1.FORCE)::DECIMAL), 4) * 100) - (SELECT ROUND((COUNT(*) / 303565::DECIMAL) * 100, 4) FROM CRIMES WHERE ETHNICITY = c1.ETHNICITY GROUP BY ETHNICITY)) AS difference_from_average
FROM (SELECT * FROM CRIMES) AS c1
GROUP BY c1.FORCE, c1.ETHNICITY
ORDER BY c1.FORCE, c1.ETHNICITY;

So the question I have revolves around reusing the same query in the 'SELECT' section more than once.

As you can see from the above query the difference_from_average is just the result of percentage_of_force minus national_average however I can't seem to figure out a way to calculate these values once and then reuse them elsewhere in the SELECT section. So my question is how can I achieve this?

Additional Info

Example Input Data

| date       | ethnicity | force           |
|------------|-----------|-----------------|
| 2018-01-01 | White     | metropolitan    |
| 2018-01-01 | White     | west-yorkshire  |
| 2018-01-01 | White     | metropolitan    |
| 2018-01-01 | White     | metropolitan    |
| 2018-01-01 | White     | north-yorkshire |
| 2018-01-01 | White     | west-yorkshire  |
| 2018-01-01 | Black     | metropolitan    |
| 2018-01-01 | Undefined | metropolitan    |
| 2018-01-01 | White     | metropolitan    |
| 2018-01-01 | White     | metropolitan    |
| 2018-01-01 | White     | norfolk         |
| 2018-01-01 | White     | north-yorkshire |
| 2018-01-01 | White     | northumbria     |
| 2018-01-01 | White     | west-yorkshire  |
| 2018-01-01 | Black     | metropolitan    |
| 2018-01-01 | Black     | metropolitan    |
| 2018-01-01 | Black     | metropolitan    |
| 2018-01-01 | Black     | metropolitan    |
| 2018-01-01 | White     | metropolitan    |
| 2018-01-01 | Black     | metropolitan    |

Example Query Result

| force             | ethnicity | num_searches | percentage_of_force | national_average | difference_from_average |
|-------------------|-----------|--------------|---------------------|------------------|-------------------------|
| avon-and-somerset | Asian     | 41           | 2.88                | 13.0641          | -10.1841                |
| avon-and-somerset | Black     | 223          | 15.64               | 25.6798          | -10.0398                |
| avon-and-somerset | Other     | 66           | 4.63                | 2.7368           | 1.8932                  |
| avon-and-somerset | Undefined | 184          | 12.9                | 7.4699           | 5.4301                  |
| avon-and-somerset | White     | 912          | 63.96               | 50.941           | 13.019                  |
| bedfordshire      | Asian     | 440          | 23.31               | 13.0641          | 10.2459                 |
| bedfordshire      | Black     | 373          | 19.76               | 25.6798          | -5.9198                 |
| bedfordshire      | Mixed     | 2            | 0.11                | 0.1084           | 0.0016                  |
| bedfordshire      | Other     | 33           | 1.75                | 2.7368           | -0.9868                 |
| bedfordshire      | Undefined | 97           | 5.14                | 7.4699           | -2.3299                 |
| bedfordshire      | White     | 943          | 49.95               | 50.941           | -0.991                  |
| btp               | Asian     | 301          | 7.14                | 13.0641          | -5.9241                 |
| btp               | Black     | 1274         | 30.23               | 25.6798          | 4.5502                  |
| btp               | Other     | 71           | 1.68                | 2.7368           | -1.0568                 |
| btp               | Undefined | 48           | 1.14                | 7.4699           | -6.3299                 |
| btp               | White     | 2521         | 59.81               | 50.941           | 8.869                   |

I'm using PostgreSQL v11.2.

There are different ways to simplify the query. You could use a series of CTEs to pre-compute the results for the different levels of aggregation. But I think that the most efficient and readable option is to use window functions.

All intermediate counts can be computed in a subquery, using COUNT(...) OVER(...) with various PARTITION BY options, as follows :

SELECT
    force,
    ethnicity,
    COUNT(*) OVER(PARTITION BY force, ethnicity) AS cnt,
    COUNT(*) OVER(PARTITION BY force) AS cnt_force,
    COUNT(*) OVER(PARTITION BY ethnicity) AS cnt_ethnicity,
    ROW_NUMBER() OVER(PARTITION BY force, ethnicity) AS rn
FROM crimes

Then the outer query can compute the final results (while filtering on the first record in each force / ethnicity tuple to avoid duplicates).

Query :

SELECT 
    force,
    ethnicity,
    cnt AS num_searches,
    ROUND(cnt / cnt_force::decimal * 100, 4) AS percentage_of_force,
    ROUND(cnt_ethnicity / 303565::decimal * 100, 4) AS national_average,
    ROUND(cnt / cnt_force::decimal * 100, 4) 
        - ROUND(cnt_ethnicity / 303565::decimal * 100, 4) AS difference_from_average
FROM (
    SELECT
        force,
        ethnicity,
        COUNT(*) OVER(PARTITION BY force, ethnicity) AS cnt,
        COUNT(*) OVER(PARTITION BY force) AS cnt_force,
        COUNT(*) OVER(PARTITION BY ethnicity) AS cnt_ethnicity,
        ROW_NUMBER() OVER(PARTITION BY force, ethnicity) AS rn
    FROM crimes
    ) x
WHERE rn = 1
ORDER BY force, ethnicity;

Demo on DB Fiddle :

| force           | ethnicity | num_searches | percentage_of_force | national_average | difference_from_average |
| --------------- | --------- | ------------ | ------------------- | ---------------- | ----------------------- |
| metropolitan    | Black     | 6            | 46.1538             | 0.0020           | 46.1518                 |
| metropolitan    | Undefined | 1            | 7.6923              | 0.0003           | 7.6920                  |
| metropolitan    | White     | 6            | 46.1538             | 0.0043           | 46.1495                 |
| norfolk         | White     | 1            | 100.0000            | 0.0043           | 99.9957                 |
| north-yorkshire | White     | 2            | 100.0000            | 0.0043           | 99.9957                 |
| northumbria     | White     | 1            | 100.0000            | 0.0043           | 99.9957                 |
| west-yorkshire  | White     | 3            | 100.0000            | 0.0043           | 99.9957                 |

The trick is to use subselects:

SELECT f(a, b), a, c
FROM (SELECT g(c, d) AS a,
             h(c) AS b, 
             c, d
      FROM x) AS q;

You get the idea.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM