简体   繁体   中英

Combining Several UNION ALL into a single query

I am working on a view that consists of the following highly-repetetive statements:

SELECT 10 as [TopN],
    (Select SUM(A) FROM (Select TOP 10 A FROM [dbo].TheTable ORDER BY A DESC) T) as A,
    (Select SUM(B) FROM (Select TOP 10 B FROM [dbo].TheTable ORDER BY B DESC) T) as B,
    (Select SUM(C) FROM (Select TOP 10 C FROM [dbo].TheTable ORDER BY C DESC) T) as C,
    (Select SUM(D) FROM (Select TOP 10 D FROM [dbo].TheTable ORDER BY D DESC) T) as D,
    (Select SUM(E) FROM (Select TOP 10 E FROM [dbo].TheTable ORDER BY E DESC) T) as E
UNION ALL
SELECT 100 as [TopN],
    (Select SUM(A) FROM (Select TOP 100 A FROM [dbo].TheTable ORDER BY A DESC) T) as A,
    (Select SUM(B) FROM (Select TOP 100 B FROM [dbo].TheTable ORDER BY B DESC) T) as B,
    (Select SUM(C) FROM (Select TOP 100 C FROM [dbo].TheTable ORDER BY C DESC) T) as C,
    (Select SUM(D) FROM (Select TOP 100 D FROM [dbo].TheTable ORDER BY D DESC) T) as D,
    (Select SUM(E) FROM (Select TOP 100 E FROM [dbo].TheTable ORDER BY E DESC) T) as E
UNION ALL
SELECT 1000 as [TopN],
    (Select SUM(A) FROM (Select TOP 1000 A FROM [dbo].TheTable ORDER BY A DESC) T) as A,
    (Select SUM(B) FROM (Select TOP 1000 B FROM [dbo].TheTable ORDER BY B DESC) T) as B,
    (Select SUM(C) FROM (Select TOP 1000 C FROM [dbo].TheTable ORDER BY C DESC) T) as C,
    (Select SUM(D) FROM (Select TOP 1000 D FROM [dbo].TheTable ORDER BY D DESC) T) as D,
    (Select SUM(E) FROM (Select TOP 1000 E FROM [dbo].TheTable ORDER BY E DESC) T) as E
UNION ALL
--etc...
--  The same 7 lines of code repeated dozens of times for different values of `TopN`

In order to produce a table of top-value summations for each column.

This is what the table looks like:

| TOPN |   A   |   B   |   C   |   D   |   E   |
|   10 |   234 |  ...
|  100 |   734 |  ...
| 1000 |  1298 |  ...
|  ... |   ... |  ...

WHY do I need this query?

In the real world, a summary report such as this answers questions like:

  • "What is the total income of the top 10 income earners", "What is the total income of the top 100 income earners", etc... in column "A"
  • "What is the total debt of the top 10 debt holders", etc.. in column "B",

And so on. Each column is a report based on the "standalone" ordering of that column. As such, the above table is an end-user deliverable.


WHAT am I looking for?

A version of the above query that is any of the following: simpler, deduplicated, more efficient, more maintainable.

The above query works fine, and produces the desired table that I have mocked out above. BUT it is clearly inefficient . For example, someone doing this by hand would be able to:

  • sort by each column once
  • Start summing until they reach the 10th sorted item, spit out the total
  • Continue summing until they hit the 100th sorted item, and spit out that total
  • etc... (eg without re-ordering or starting over at element 1 again).

The above query is also repetitious - for example, if this were a stored procedure, one could loop over a list of values (10, 100, 1000, etc..) and produce this table one row at a time using a single chunk of parameterized code (as in @Larnu's answer below). This approach is not supported by views though. Since the current implementation is as a View, it would be considered a regression if it were converted to a stored proc or function that must be executed differently (because all existing usages would have to be modified).

Therefore, my ask is simply whether there is any way to make this better.


My Ideas

Ideally, I could inline a list of values, for example:

Select * From (VALUES((10), (100), (1000), (5000), ...)) AS TOPN_VALUES(TOPN)

Or I'm happy to have those values captured in a simple 1-column table somewhere.

Either way, what remains is the need for (likely) some clever join or cross apply logic to generate all of the above table entries from that list of numbers, as opposed to having the numbers all hard-coded amongst dozens of chinks copy-pasted Select statements as they are in the original query.

One thing that's clear to me is that SELECT TOP X cannot be parameterized in a view, so at the very least, we will have to re-implement that logic in a different way. One potential solution is to rewrite:

Select SUM(A) FROM (Select TOP 10 A FROM [dbo].SomeTable ORDER BY A DESC)

As:

Select SUM(A) FROM (Select A, ROW_NUMBER() over (order by A desc) as [Rank])
WHERE [RANK] < 10

At which point the "10" above can be dynamically determined by a value joined in from another table. Still lots of work to be done here to get to a full solution though...


Thanks for your help

If I am reading between the lines correctly, then use an inline table-value function:

CREATE FUNCTION dbo.YourFunction (@Top int) 
RETURNS table
AS RETURN
    SELECT @Top AS [TopN],
           (SELECT SUM(A) FROM (Select TOP (@Top) A FROM [dbo].SomeTable ORDER BY A DESC) T) as A,
           (SELECT SUM(B) FROM (Select TOP (@Top) B FROM [dbo].SomeTable ORDER BY B DESC) T) as B,
           (SELECT SUM(C) FROM (Select TOP (@Top) C FROM [dbo].SomeTable ORDER BY C DESC) T) as C,
           (SELECT SUM(D) FROM (Select TOP (@Top) D FROM [dbo].SomeTable ORDER BY D DESC) T) as D,
           (SELECT SUM(E) FROM (Select TOP (@Top) E FROM [dbo].SomeTable ORDER BY E DESC) T) as E;
GO

Then you just call the function like below:

SELECT *
FROM dbo.YourFunction(10);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM