简体   繁体   中英

Distribute sequential SQL results evenly based on count

I have SQL results that I need to break into item ranges and the count distributed evenly across a number of tasks. What is a good way to do this?

My data looks like this.

+------+-------+----------+
| Item | Count | ItmGroup |
+------+-------+----------+
| 1A   |   100 |        1 |
| 1B   |    25 |        1 |
| 1C   |     2 |        1 |
| 1D   |     6 |        1 |
| 2A   |    88 |        2 |
| 2B   |    10 |        2 |
| 2C   |   122 |        2 |
| 2D   |    12 |        2 |
| 3A   |     4 |        3 |
| 3B   |   103 |        3 |
| 3C   |     1 |        3 |
| 3D   |    22 |        3 |
| 4A   |    55 |        4 |
| 4B   |    42 |        4 |
| 4C   |   100 |        4 |
| 4D   |     1 |        4 |
+------+-------+----------+

Item = the item code. Count = this context it is determining the popularity of the item. This can be used to RANK items if need be. ItmGroup - this is a parent value for the Itm column. Item is contained in a Group.

What differentiates this from other similar questions I'veviewed is that the ranges I need to determine cannot be taken out of the order they show in this table. We can do Item Range from A1 to B3, in other words, they can cross over ItmGroups, but they must remain in alphanumeric order by Item.

The expected result would be item ranges that evenly distribute the total count.

+------+-------+----------+
| FrItem | ToItem | TotCount|
+------+-------+----------+
| 1A   |   2D  |      134 |
| 3A   |   3D  |      130 |
(etc)

Provided you've happy with a rough estimate, this will split the data in to two groups.

The first group will always have as many records as possible, but no more than half of the total count (and group 2 will have the rest) .

WITH
  cumulative AS
(
  SELECT
    *,
    SUM([Count]) OVER (ORDER BY Item)   AS cumulativeCount,
    SUM([Count]) OVER ()                AS totalCount
  FROM
    yourData
)
SELECT
  MIN(item)    AS frItem,
  MAX(item)    AS toItem,
  SUM([Count]) AS TotCount
FROM
  cumulative
GROUP BY
  CASE WHEN cumulativeCount <= totalCount / 2 THEN 0 ELSE 1 END
ORDER BY
  CASE WHEN cumulativeCount <= totalCount / 2 THEN 0 ELSE 1 END

To split the data in to 5 portions, it's similar...

GROUP BY
  CASE WHEN cumulativeCount <= totalCount * 1/5 THEN 0
       WHEN cumulativeCount <= totalCount * 2/5 THEN 1
       WHEN cumulativeCount <= totalCount * 3/5 THEN 2
       WHEN cumulativeCount <= totalCount * 4/5 THEN 3
                                                ELSE 4 END

Depending on your data this isn't necessarily ideal

 Item | Count       GroupAsDefinedAbove   IdealGroup
------+-------
  1A  |   4              1                  1
  2A  |   5              2                  1
  3A  |   8              2                  2

If you want something that can get the two groups as close in size as possible, that's a lot more complex.

Same as the accepted answer, except declaring a batch number and an addition to the select statement in the WITH cumulativeCte to prevent a remainder.

  DECLARE @BatchCount NUMERIC(4,2) = 5.00;

    WITH
      cumulativeCte AS
    (
      SELECT
        *,
        SUM(r.[Count]) OVER (ORDER BY Item)   AS cumulativeCount,
        SUM(r.[Count]) OVER ()                AS totalCount
        ,CEILING(SUM(r.[Count]) OVER (ORDER BY IM.MMITNO ASC) / (SUM(r.[Count]) OVER () / @BatchCount)) AS BatchNo
      FROM
        records r
    )
    SELECT
      MIN(c.Item)    AS frItem,
      MAX(c.Item)    AS toItem,
      SUM(c.[Count]) AS TotCount,
      c.BatchNo

    FROM
      cumulativeCte c
    GROUP BY
      c.BatchNo
    ORDER BY
      c.BatchNo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM