简体   繁体   English

带分组依据的Oracle SQL max()-重复值

[英]Oracle SQL max() with group by - duplicated values

This is a simplified version of a query I created that gives me what I want (a list of all stud_id with the selected cpnt_id, whether there is a value in compl_dte or not, but only if the UserInput for Item is limited to only 1 record. 这是我创建的查询的简化版本,它提供了我想要的信息(具有选定cpnt_id的所有stud_id的列表,compl_dte中是否有值,但仅当Item的UserInput仅限于1条记录时。

select stud.*, lrnhist.* from
(select s.stud_id,
        i.cpnt_id
from student s, item i
where s.stud_id in [UserInput]
      c.cpnt_id in [UserInput]
) stud
left outer join
(select lh.stud_id,
        lh.cpnt_id,
        max(lh.compl_dte) compl_dte
from learnhist lh
where lh.cpnt_id in [UserInput]
group by lh.stud_id, lh.cpnt_id
) 
on stud.stud_id = lrnhist.stud_id

When it is run where UserInput specifies 2 or more Items, it returns the correct rows, but the returned value of compl_dte is always identical for each value of stud_id (because of the use of max(compl_dte) I'm sure). 在UserInput指定2个或更多项目的情况下运行该命令时,它将返回正确的行,但对于stud_id的每个值,compl_dte的返回值始终是相同的(因为我确定使用max(compl_dte))。 I'm just not sure what I need to do to make sure the returned compl_dte is the max for the stud_id/cpnt_id pair, not the max for stud_id regardless of cpnt_id. 我只是不确定我需要做什么以确保返回的compl_dte是stud_id / cpnt_id对的最大值,而不是stud_id的最大值,与cpnt_id无关。

Table values: 表值:

student
stud_id
1
2
3
4
item
cpnt_id
a
b
c
d
learnhist
stud_id cpnt_id compl_dte
1    a    5/5/2017
1    a    3/3/2016
1    b    10/10/2016
2    c    8/8/2016
3    b    2/2/2017

Results where UserInput is stud_id = * and cpnt_id = a: UserInput为stud_id = *和cpnt_id = a的结果:

stud_id cpnt_id compl_dte
1    a    5/5/2017
2    a
3    a
4    a

which is correct. 哪个是对的。 Results where UserInput is stud_id = * and cpnt_id = both a and b: UserInput为stud_id = *和cpnt_id = a和b的结果:

stud_id cpnt_id compl_dte
1    a    5/5/2017
1    b    5/5/2017
2    a
2    b
3    a    2/2/2017
3    b    2/2/2017
4    a
4    b

which is not what I'm looking for. 这不是我想要的。 Results I'm looking for in that case: 在这种情况下,我正在寻找结果:

stud_id cpnt_id compl_dte
1    a    5/5/2017
1    b    10/10/2016
2    a
2    b
3    a
3    b    2/2/2017
4    a
4    b

First post here, hopefully that all makes sense and I've asked in the right place! 希望在这里第一篇文章都有意义,我已经在正确的位置提出要求!

I believe the problem may be a missing join predicate between the STUD and LRNHIST inline views. 我认为问题可能是STUDLRNHIST内联视图之间缺少LRNHIST谓词。
In the query you provided, the STUD inline view is a cartesian product between STUDENT and ITEM , which is then outer joined to the LRNHIST view that indeed has one CMPL_DTE per STUD_ID / CPNT_ID pair. 在您提供的查询中, STUD内联视图是STUDENTITEM之间的笛卡尔乘积,然后将其外部连接到LRNHIST视图,该视图确实每个STUD_ID / CPNT_ID对具有一个CMPL_DTE But since the OUTER JOIN only predicates on STUD_ID , you'll also get matches where STUD.CPNT_ID <> LRNHST.CPNT_ID , providing extra rows. 但是由于OUTER JOIN仅基于STUD_ID谓词, STUD_ID您还将在STUD.CPNT_ID <> LRNHST.CPNT_ID获得匹配STUD.CPNT_ID <> LRNHST.CPNT_ID ,从而提供额外的行。

You break it down and look at the inline views individually: 您将其分解并分别查看内联视图:

For the STUD query: 对于STUD查询:

SELECT STUDENT.STUD_ID, ITEM.CPNT_ID FROM STUDENT 
CROSS JOIN ITEM
WHERE STUDENT.STUD_ID IN (1,2,3,4)
AND ITEM.CPNT_ID IN ('a','b','c','d');

Result: 结果:

stud_id     cpnt_id
1   a
1   b
1   c
1   d
2   a
2   b
2   c
2   d
... etc

So we can expect all these rows in the final query. 因此,我们可以在最终查询中期待所有这些行。

If you look at LRNHST individually: 如果您单独查看LRNHST

SELECT LEARNHIST.STUD_ID,
                LEARNHIST.CPNT_ID,
                 MAX(LEARNHIST.COMPL_DTE) COMPL_DTE
                 FROM LEARNHIST
                 GROUP BY LEARNHIST.STUD_ID, LEARNHIST.CPNT_ID;

There is indeed only one row per stud_id-cpnt_id pair (that exists in learnhist ): 实际上,每个stud_id-cpnt_id对仅存在一行(在learnhist中存在):

stud_id     cpnt_id     compl_dte
1   b   October, 10 2016 00:00:00
1   a   May, 05 2017 00:00:00
3   b   February, 02 2017 00:00:00
2   c   August, 08 2016 00:00:00

Now if you join using only STUD_ID , you'll get a May 5th row for where STUD has 1 - a and LRNHST has 1 - a , but you'll also get a row where LRNHST has 1 -b , because there is no join predicate on CPNT_ID . 现在,如果仅使用STUD_ID加入, STUD_ID获得May 5th一行,其中STUD具有1 - aLRNHST具有1 - a ,但是您还将获得其中LRNHST具有1 -b的行,因为没有连接基于CPNT_ID谓词。 If you select ALL five columns, you can see where the duplication comes in: 如果选择全部五列,则可以看到重复项的位置:

SELECT STUD.*, LRNHIST.* FROM (
SELECT STUDENT.STUD_ID, ITEM.CPNT_ID FROM STUDENT 
CROSS JOIN ITEM
WHERE STUDENT.STUD_ID IN (1,2,3,4)
AND ITEM.CPNT_ID IN ('a','b','c','d')) STUD
LEFT OUTER JOIN (SELECT LEARNHIST.STUD_ID,
                LEARNHIST.CPNT_ID,
                 MAX(LEARNHIST.COMPL_DTE) COMPL_DTE
                 FROM LEARNHIST
                 GROUP BY LEARNHIST.STUD_ID, LEARNHIST.CPNT_ID
                ) LRNHIST
ON STUD.STUD_ID = LRNHIST.STUD_ID
ORDER BY 1 ASC, 2 ASC, 3 ASC, 4 ASC, 5 ASC;

Result: 结果:

s_stud  s_cpnt  l_stud  l_cpnt  l_compl

1   a   1   a   May, 05 2017 00:00:00
1   a   1   b   October, 10 2016 00:00:00
1   b   1   a   May, 05 2017 00:00:00
1   b   1   b   October, 10 2016 00:00:00
1   c   1   a   May, 05 2017 00:00:00
1   c   1   b   October, 10 2016 00:00:00
1   d   1   a   May, 05 2017 00:00:00
1   d   1   b   October, 10 2016 00:00:00
2   a   2   c   August, 08 2016 00:00:00
... etc

Because this only joins on stud_id , both the Oct and May records are free to match STUD 's 1-a matches LRNHST 's 1 for in both its 1-a groud and 1-b group. 因为这只是在加入stud_id ,无论是OctMay的记录可以随意搭配STUD1-a匹配LRNHST1在它的两个1-a对地表和1-b组。

Now if you join with CPNT_ID as well, only the LRNHST records that match Both CPNT_ID and STUD_ID will be returned. 现在,如果你加入CPNT_ID为好,只有LRNHST符合两者的记录CPNT_IDSTUD_ID将被退回。 ( May for 1-a and Oct for 1-b ) (对于1-a May Oct对于1-b 1-a Oct

SELECT STUD.STUD_ID, STUD.CPNT_ID, LRNHIST.COMPL_DTE FROM (
SELECT STUDENT.STUD_ID, ITEM.CPNT_ID FROM STUDENT 
CROSS JOIN ITEM
WHERE STUDENT.STUD_ID IN (1,2,3,4)
AND ITEM.CPNT_ID IN ('a','b','c','d')) STUD
LEFT OUTER JOIN (SELECT LEARNHIST.STUD_ID,
                LEARNHIST.CPNT_ID,
                 MAX(LEARNHIST.COMPL_DTE) COMPL_DTE
                 FROM LEARNHIST
                 GROUP BY LEARNHIST.STUD_ID, LEARNHIST.CPNT_ID
                ) LRNHIST
ON STUD.STUD_ID = LRNHIST.STUD_ID
AND STUD.CPNT_ID = LRNHIST.CPNT_ID
ORDER BY 1 ASC, 2 ASC;

Result: 结果:

stud_id     cpnt_id     compl_dte
1   a   May, 05 2017 00:00:00
1   b   October, 10 2016 00:00:00
1   c   (null)
1   d   (null)
2   a   (null)
2   b   (null)
2   c   August, 08 2016 00:00:00
2   d   (null)
... etc

Now you should have only one row per STUD_ID CPNT_ID pair, with nulls for compl_dte where no LRNHST record matches. 现在你应该有每次只有一排STUD_ID CPNT_ID对,用空的compl_dte没有地方LRNHST记录匹配。

Use a factored subquery. 使用分解的子查询。

WITH all_ids AS (
SELECT s.stud_id as stud_id,
       i.cpnt_id as cpnt_id
  FROM student s
CROSS JOIN item i )
SELECT stud_id, cpnt_id, max(lh.compl_dte) as compl_dte
  FROM all_ids
LEFT JOIN learnhist lh USING (stud_id, cpnt_id)
 WHERE cpnt_id IN ('a', 'b')
GROUP BY stud_id, cpnt_id
ORDER BY stud_id;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM