简体   繁体   English

SQL Group By,每个日期加总和,并使用最大日期

[英]SQL Group By with Sum for each date and using max date

I've seen a lot of similar questions but nothing that quite nails my particular problem. 我见过很多类似的问题,但没有什么能解决我的特殊问题。

I have a table storing multiple positions for each account. 我有一个表,每个帐户存储多个职位。 Changes are stored as deltas. 更改存储为增量。 So take for example on day 1 the following... 因此,以第1天为例,以下内容...

AC_ID | POS_ID | ASAT       | VAL
    1 |      1 | 2016-01-01 | 100
    1 |      2 | 2016-01-01 | 200

The total value for AC_ID 1 is 300 on 01/01/2016.The next day it may update to be... AC_ID 1的总价值是AC_ID 1月1日的300,第二天可能会更新为...

AC_ID | POS_ID | ASAT       | VAL
    1 |      1 | 2016-01-01 | 100
    1 |      2 | 2016-01-01 | 200
    1 |      2 | 2016-01-02 | 250

Now the total value for AC_ID 1 is 350. This is because the new record for POS_ID 2 overrides the previous, but the value for POS_ID 1 has not changed. 现在, AC_ID 1的总值为350。这是因为POS_ID 2的新记录将覆盖以前的记录,但是POS_ID 1的值未更改。 In order to remove POS_ID 1 the table would change to something like... 为了删除POS_ID 1,表将更改为类似...

AC_ID | POS_ID | ASAT       | VAL
    1 |      1 | 2016-01-01 | 100
    1 |      2 | 2016-01-01 | 200
    1 |      2 | 2016-01-02 | 250
    1 |      1 | 2016-01-03 | 0

Now the value changes to 250 on day 3. 现在,值在第3天更改为250。

I can calculate the value at any given date with a subquery like so 我可以像这样子查询来计算任何给定日期的值

SELECT SUM(VAL) FROM POSITION P1
WHERE P1.ASAT = 
  (SELECT MAX(P2.ASAT) FROM POSITION P2
   WHERE P1.AC_ID  = P2.AC_ID
   AND   P1.POS_ID = P2.POS_ID
   AND   P2.DATE <= [CHOSEN DATE])

What I'd like to do now is write a single query that will give me the total value for every AC_ID for every ASAT . 我现在想做的是编写一个查询,该查询将为我提供每个ASAT每个AC_ID If not for the delta storage mechanism I could easily achieve this using 如果不使用增量存储机制,我可以轻松地使用

SELECT AC_ID, ASAT, SUM(VAL) FROM POSITION
GROUP BY AC_ID, ASAT
ORDER BY ASAT DESC

What I'm looking for is something that will achieve the above but take into account the join back on the table. 我正在寻找的东西将实现上述目标,但要考虑到表上的联接。 If I use the above then I'll only get totals for anything that changed on the ASAT date and not all of the existing values that haven't changed. 如果使用上面的方法,那么我将仅获得ASAT日期更改的所有内容的总计,而不是未更改的所有现有值的总计。

In the above example that should equate to a resultset of 在上面的示例中,该结果应等于的结果集

AC_ID | ASAT       | SUM(VAL)
    1 | 2016-01-01 |      300
    1 | 2016-01-02 |      350
    1 | 2016-01-03 |      250

Here's another example of data vs output 这是数据与输出的另一个示例

AC_ID | POS_ID | ASAT       | VAL
    1 |      1 | 2016-01-01 | 100
    1 |      2 | 2016-01-01 | 200
    1 |      2 | 2016-01-02 | 250
    1 |      1 | 2016-01-03 | 0
    2 |      1 | 2016-01-02 | 500
    3 |      7 | 2016-01-02 | 1000
    3 |      7 | 2016-01-03 | 1000
    3 |     12 | 2016-01-03 | 5000
    2 |      1 | 2016-01-04 | 750

Result 结果

AC_ID | ASAT       | SUM(VAL)
    1 | 2016-01-01 |      300
    1 | 2016-01-02 |      350
    1 | 2016-01-03 |      250
    2 | 2016-01-02 |      500
    2 | 2016-01-04 |      750
    3 | 2016-01-02 |     1000
    3 | 2016-01-03 |     6000

I CHANGED HOW THIS WORKS 我更改了此工作方式

Although the answers below worked the performance of them was shockingly bad (through no fault of the authors!) In order to get this to something acceptable (I need sub-second return) I refactored the table to include an end_date column. 尽管下面的答案起作用了,但它们的性能却非常糟糕(作者没有错!)为了使其达到可接受的水平(我需要亚秒级的返回值),我将表重构为包括一个end_date列。 This column gets updated on each insert to set the life span of that row. 该列在每次插入时都会更新,以设置该行的生命周期。 If a row doesn't have a superseding entry then the end date is set to 9999-12-31. 如果某行没有替代条目,那么结束日期将设置为9999-12-31。 My example above becomes... 我上面的例子变成...

AC_ID | POS_ID | ASAT       | END_DATE   | VAL
    1 |      1 | 2016-01-01 | 2016-01-03 |  100
    1 |      2 | 2016-01-01 | 2016-01-02 |  200
    1 |      2 | 2016-01-02 | 9999-12-31 |  250
    1 |      1 | 2016-01-03 | 9999-12-31 |    0
    2 |      1 | 2016-01-02 | 2016-01-04 |  500
    3 |      7 | 2016-01-02 | 2016-01-03 | 1000
    3 |      7 | 2016-01-03 | 9999-12-31 | 1000
    3 |     12 | 2016-01-03 | 9999-12-31 | 5000
    2 |      1 | 2016-01-04 | 9999-12-31 |  750

I can then remove the second join from accepted answer and add an extra clause to the inner join. 然后,我可以从接受的答案中删除第二个联接,并向内部联接添加一个额外的子句。

SELECT
  p1.AC_ID, 
  p1.ASAT, 
  SUM(p2.VAL) as totalValue
FROM 
  (SELECT DISTINCT AC_ID, ASAT FROM position) p1
INNER JOIN position p2 ON
  p2.AC_ID    =  p1.AC_ID AND
  p2.ASAT     <= p1.ASAT AND
  p2.END_DATE >  p1.END_DATE
GROUP BY 
  p1.AC_ID,
  p1.ASAT;

This should give you what you need: 这应该给您您需要的:

SELECT
    P1.ac_id,
    P1.asat,
    SUM(P2.val) AS total_value
FROM
    (SELECT DISTINCT P.ac_id, P.asat FROM dbo.Position P) P1
INNER JOIN dbo.Position P2 ON
    P2.ac_id = P1.ac_id AND
    P2.asat <= P1.asat
LEFT OUTER JOIN dbo.Position P3 ON
    P3.ac_id = P1.ac_id AND
    P3.pos_id = P2.pos_id AND
    P3.asat > P2.asat AND
    P3.asat <= P1.asat
WHERE
    P3.ac_id IS NULL
GROUP BY
    P1.ac_id,
    P1.asat

The query gets you all of your ac_id / asat combinations, then grabs any rows that might fall into those that need to be totaled, and finally uses the LEFT OUTER JOIN and check for NULL to eliminate any rows that aren't the latest for that particular pos_id . 该查询将获取所有ac_id / asat组合,然后获取可能属于总计的行,最后使用LEFT OUTER JOIN并检查NULL以消除不是该行的最新行特定的pos_id

This is not particularly efficient, but I think it should do what you want: 这不是特别有效,但是我认为它应该做您想要的:

SELECT aa.AC_ID, aa.ASAT,  SUM(p.VAL)
FROM (SELECT DISTINCT AC_ID, ASAT FROM POSITION
     ) aa JOIN
     POSITION P
     ON p.AC_ID = aa.AC_ID and p.ASAT <= aa.ASAT
WHERE P.ASAT = (SELECT MAX(P2.ASAT)
                FROM POSITION P2
                WHERE P.AC_ID  = P2.AC_ID AND
                      P.POS_ID = P2.POS_ID AND
                      P2.ASAT <= aa.ASAT
               )
GROUP BY aa.AC_ID, aa.ASAT;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM