简体   繁体   中英

SQL Server - Only get a single row for each distinct value of a field

I just got home from a job interview in which they made me take a programming test. One of the questions which really stumped me was as follows:

You are a teacher at a high school and have been put in charge of picking the best possible debate team for the upcoming National Debate Championships. Given the following table structure:

 CREATE TABLE CompetitionResults ( StudentName NVARCHAR(255) NOT NULL, -- The student's name SchoolYear INT NOT NULL, -- The school year of the student at the time they entered the competition CompetitionDate DATE NOT NULL, -- The date of the competition CompetitionResult INT NOT NULL -- The student's final score in the competition (0 - 100) )

Write a query that will return the names of the best candidates for the upcoming competition, based on their previous competition results.

Constraints:

  • Return a single column, StudentName .
  • Only one student should be picked from each school year (7 - 12).
  • Each returned student must have competed in exactly 3 other competitions this year.

It's the last constraint in particular that I had the most trouble with. Here's what I ended up submitting after running out of time:

SELECT
    StudentName AS sn,
    (SELECT COUNT(*) AS NumComps, CompetitionDate FROM CompetitionResults
        WHERE YEAR(CompetitionDate) = 2020 AND NumComps = 3),
    SchoolYear,
    CompetitionDate,
    CompetitionResult
FROM CompetitionResults
WHERE CompetitionDate IN (SELECT MIN(CompetitionDate)
    FROM CompetitionResults GROUP BY CompetitionDate) AND
    CompetitionResult IN (SELECT MAX(CompetitionResult) FROM
    CompetitionResults WHERE StudentName = sn);

In the interest of professional growth, I'd love to be able to tackle this problem with as little help as possible, but as you can probably tell, I'm really struggling here. This code won't even compile, let alone the performance implications of all the subqueries! I find them easier to code than joins, however, hence my use of them here.

Any guidance/tips would be very much appreciated. MTIA :-)

To me, this is basically aggregation . . . with a little bit of window functions:

select studentname, SchoolYear, avg_competitionscore
from (select studentname, SchoolYear, avg(competitionscore) as avg_competitionscore,
             row_number() over (partition by SchoolYear order by avg(competitionscore) desc) as seqnum
      from CompetitionResults cr
      where year(CompetitionDate) = year(getdate())
      group by studentname
      having count(*) = 3
     ) s
where seqnum = 1;

The subquery is summarizing the competitions for each student, applying the appropriate filtering conditions -- both on the individual competitions and on the overall number. The outer query chooses one per year.

I don't see how exactly three competitions has anything to do with the best. I suspect that the part about choosing the best students based on scores is a "hidden requirement" used to distinguish between merely acceptable solutions from the best solutions.

I suppose there could be additional logic to check that there is at least one candidate per year, but the question suggests that there is at least one such student.

I guess this can be solved with Window Functions. Following an example - might need some tweaking after all, but you should get the idea:

DECLARE @t TABLE(
  StudentName NVARCHAR(255)
 ,SchoolYear INT
 ,CompetitionDate DATE
 ,CompetitionResult INT
)

INSERT INTO @t VALUES
('Peter', 7, '2019-01-01', 100)
,('Peter', 8, '2020-01-01', 100)
,('Peter', 8, '2020-03-01', 100)
,('Paul', 10, '2020-01-01', 100)
,('Paul', 10, '2020-03-01', 100)
,('Paul', 10, '2020-04-01', 100)
,('Mary', 11, '2019-01-01', 100)
,('Mary', 11, '2019-02-01', 100)
,('Mary', 11, '2019-03-01', 100)
,('Jacob', 12, '2020-01-01', 100)
,('Jacob', 12, '2020-02-01', 100)
,('Jacob', 12, '2020-03-01', 100)
,('Jacob', 12, '2020-04-01', 90)
,('Jennifer', 9, '2020-03-01', 100)
,('Jennifer', 9, '2020-04-01', 100)
,('Jennifer', 9, '2020-05-01', 100)
,('Lucas', 12, '2020-03-01', 100)
,('Lucas', 12, '2020-04-01', 100)
,('Lucas', 12, '2020-05-01', 100)

;WITH cte AS(
SELECT *
      ,COUNT(CASE WHEN YEAR(CompetitionDate) = YEAR(GETDATE()) THEN 1 ELSE NULL END) OVER (PARTITION BY StudentName, YEAR(CompetitionDate)) AS CountCompYear
      ,ROW_NUMBER() OVER (PARTITION BY StudentName ORDER BY CompetitionDate DESC) AS LastCompetition
      
  FROM @t
),
cteFilter AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY SchoolYear ORDER BY CompetitionResult DESC, StudentName ASC) AS DistStudent
  FROM cte
  WHERE CountCompYear = 3
    AND LastCompetition = 1
)
SELECT *
  FROM cteFilter
  WHERE DistStudent = 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM