简体   繁体   中英

Generate 'average' column from sub query and ROW_NUMBER window function in SQL SELECT

I have the following SQL Server tables (with sample data):

Questionnaire

id | coachNodeId | youngPersonNodeId | complete
1  | 12          | 678               | 1
2  | 12          | 52                | 1
3  | 30          | 99                | 1
4  | 12          | 678               | 1
5  | 12          | 678               | 1
6  | 30          | 99                | 1
7  | 12          | 52                | 1
8  | 30          | 102               | 1

Answer

id | questionnaireId | score
1  | 1               | 1
2  | 2               | 3
3  | 2               | 2
4  | 2               | 5
5  | 3               | 5
6  | 4               | 5
7  | 4               | 3
8  | 5               | 4
9  | 6               | 1
10 | 6               | 3
11 | 7               | 5
12 | 8               | 5

ContentNode

id  | text
12  | Zak
30  | Phil
52  | Jane
99  | Ali
102 | Ed
678 | Chris 

I have the following T-SQL query:

SELECT
    Questionnaire.id AS questionnaireId, 
    coachNodeId AS coachNodeId, 
    coachNode.[text] AS coachName, 
    youngPersonNodeId AS youngPersonNodeId, 
    youngPersonNode.[text] AS youngPersonName,
    ROW_NUMBER() OVER (PARTITION BY Questionnaire.coachNodeId, Questionnaire.youngPersonNodeId ORDER BY Questionnaire.id) AS questionnaireNumber,
    score = (SELECT AVG(score) FROM Answer WHERE Answer.questionnaireId = Questionnaire.id)
FROM            
    Questionnaire
LEFT JOIN 
    ContentNode AS coachNode ON Questionnaire.coachNodeId = coachNode.id 
LEFT JOIN 
    ContentNode AS youngPersonNode ON Questionnaire.youngPersonNodeId = youngPersonNode.id
WHERE        
    (complete = 1)
ORDER BY 
    coachNodeId, youngPersonNodeId

This query outputs the following example data:

questionnaireId | coachNodeId | coachName | youngPersonNodeId | youngPersonName | questionnaireNumber | score
1               | 12          | Zak       | 678               | Chris           | 1                   | 1
2               | 12          | Zak       | 52                | Jane            | 1                   | 3
3               | 30          | Phil      | 99                | Ali             | 1                   | 5
4               | 12          | Zak       | 678               | Chris           | 2                   | 4
5               | 12          | Zak       | 678               | Chris           | 3                   | 4
6               | 30          | Phil      | 99                | Ali             | 2                   | 2
7               | 12          | Zak       | 52                | Jane            | 2                   | 5
8               | 30          | Phil      | 102               | Ed              | 1                   | 5

To explain what's happening here… There are various coaches whose job is to undertake questionnaires with various young people, and log the scores. A coach might, at a later date, repeat the questionnaire with the same young person several times, hoping that they get a better score. The ultimate goal of what I'm trying to achieve is that the managers of the coaches want to see how well the coaches are performing, so they'd like to see whether the scores for the questionnaires tend to go up or not. The window function represents a way to establish how many times the questionnaire has been undertaken by the same coach/young person combo.

I need to be able to determine the average score based on the questionnaire number. So for example, the coach 'Zak' logged scores of '1' and '3' for his first questionnaires (where questionnaireNumber = 1) so the average would be 2. For his second questionnaires (where questionnaireNumber = 2) the scores were '3' and '5' so the average would be 4. So in analysing this data we know that over time Zak's questionnaire scores have improved from an average of '2' the first time to an average of '4' the second time.

I feel like the query needs to be grouped by the coachNodeId and questionnaireNumber values so it would output something like this (I've ommitted the questionnaireId , youngPersonNodeId , youngPersonName and score columns as they aren't crucial for the output — they're only used to derive the averageScore — and wouldn't be useful the way the results are grouped):

coachNodeId | coachName | questionnaireNumber | averageScore
12          | Zak       | 1                   | 2                      (calculation: (1 + 3) / 2)
12          | Zak       | 2                   | 4                      (calculation: (3 + 5) / 2)
12          | Zak       | 3                   | 4                      (only one value: 4)
30          | Phil      | 1                   | 5                      (calculation: (5 + 5) / 2)
30          | Phil      | 2                   | 2                      (only one value: 2)

Could anyone suggest how I can modify my query to output the average scores based on the score from the sub-query and the ROW_NUMBER window function? I've hit the limits of my SQL skills!

Many thanks.

It is a bit hard to tell without sample data, but I think you are describing aggregation:

SELECT q.coachNodeId AS coachNodeId, 
       cn.[text] AS coachName, 
       q.youngPersonNodeId AS youngPersonNodeId, 
       ypn.[text] AS youngPersonName,
       AVG(score)
FROM Questionnaire q JOIN
     ContentNode cn
     ON q.coachNodeId = cn.id  JOIN
     ContentNode ypn
     ON q.youngPersonNodeId = ypn.id LEFT JOIN
     Answer a
     ON a.questionnaireId = q.id
WHERE complete = 1
GROUP BY q.coachNodeID, cn.[text] AS coachName, 
         q.youngPersonNodeId, ypn.[text]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM