Difference between rows within a group

Question

I am translating my R code into SQL.

My R code is bellow:

temp <- questions %>%
  select(UserId, ResultIndicator, ThemeId) %>%
  filter(UserId == 72) %>%
  group_by(ThemeId, ResultIndicator) %>%
  arrange(desc(ResultIndicator)) %>%
  summarise(Nominal = n()) %>% 
  mutate(Percent = Nominal/sum(Nominal)) %>%
  mutate(Percent = round(Percent, 3) * 100)  %>%  
  mutate(diff = Percent - lag(Percent, default = first(Percent)))

The output is the following:

structure(list(ThemeId = c(11L, 11L, 12L, 12L, 13L, 19L), ResultIndicator = c("Correct", 
"Wrong", "Correct", "Wrong", "Correct", "Wrong"), Nominal = c(34L, 
4L, 25L, 2L, 10L, 1L), Percent = c(89.5, 10.5, 92.6, 7.4, 100, 
100), diff = c(0, -79, 0, -85.2, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), vars = "ThemeId", labels = structure(list(
    ThemeId = c(11L, 12L, 13L, 19L)), class = "data.frame", row.names = c(NA, 
-4L), vars = "ThemeId", labels = structure(list(ThemeId = c(11L, 
12L, 13L, 19L, 22L, 33L, 35L, 38L, 48L, 56L, 59L, 62L, 71L, 77L
)), row.names = c(NA, -14L), class = "data.frame", vars = "ThemeId", drop = TRUE), indices = list(
    0:1, 2:3, 4L, 5L, 6:7, 8:9, 10:11, 12L, 13:14, 15:16, 17:18, 
    19:20, 21:22, 23:24), drop = TRUE, group_sizes = c(2L, 2L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), biggest_group_size = 2L), indices = list(
    0:1, 2:3, 4L, 5L), drop = TRUE, group_sizes = c(2L, 2L, 1L, 
1L), biggest_group_size = 2L)

For non-R users, above translates as

  ThemeId ResultIndicator Nominal Percent  diff
1      11         Correct      34    89.5   0.0
2      11           Wrong       4    10.5 -79.0
3      12         Correct      25    92.6   0.0
4      12           Wrong       2     7.4 -85.2
5      13         Correct      10   100.0   0.0
6      19           Wrong       1   100.0   0.0

My attempt in SQL is this:

SELECT Count(Id) as Nominal, ResultIndicator, ThemeId
FROM LogUserQuestions
WHERE UserId = 72
GROUP BY ThemeId, ResultIndicator
ORDER BY ThemeId

But I don't know how to calculate the lag. I tried:

(Nominal - lag(Nominal) over (partition by [not sure] order by [not sure])) as diff

But I can't use Nominal, as it is created afterwards.

Any hints?

Answer 1

I think it is something like this:

SELECT ResultIndicator, ThemeId, COUNT(*) as Nominal, 
       COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (),
       COUNT(*) - LAG(COUNT(*)) OVER (ORDER BY ResultIndicator) as diff
FROM LogUserQuestions
WHERE UserId = 72
GROUP BY ThemeId, ResultIndicator
ORDER BY ResultIndicator DESC;

Answer 2

Actually, contrary to your supposition but I can't use Nominal, as it is created afterwards , you can use fields in aggregate query with a CTE . In fact, you may want to use multiple CTEs for the calculated Percent which is based off of earlier aggregate, Nominal . However, as @GordonLinoff shows, all can possibly be run in one query but readability may then be the issue.

WITH agg AS (    
   SELECT ThemeId, ResultIndicator, Count(Id) as Nominal, 
   FROM LogUserQuestions
   WHERE UserId = 72
   GROUP BY ThemeId, ResultIndicator    
), pct AS
   SELECT ResultIndicator, ThemeId, Nominal,
          Nominal / SUM(Nominal) OVER (PARTITION BY ThemeId, ResultIndicator) AS [Percent]
   FROM agg
)

SELECT ResultIndicator, ThemeId, Nominal, ROUND([Percent], 0) AS [Percent],
       -- LAG() maintains three arguments: expression, offset, default
       ([Percent] - LAG([Percent], 1, [Percent])
                        OVER (PARTITION BY ThemeId ORDER BY ResultIndicator DESC) as diff
FROM pct

Answer 3

I don't have your data in a SQL table so this is a little difficult. Usually what I do in these situations is to build a unique key, wrap the whole result in a subquery, and then select from that. Something like this:

SELECT Nominal, 
  ResultIndicator, 
  ThemeId, 
  (Nominal - lag(Nominal) over (partition by myKey order by myKey)) as diff
FROM
(
  SELECT Count(Id) as Nominal, ResultIndicator, ThemeId, ResultIndicator + CAST(ThemeId as varchar(50)) as myKey
  FROM LogUserQuestions
  WHERE UserId = 72
  GROUP BY ThemeId, ResultIndicator
  ORDER BY ThemeId, ResultIndicator
) sub
order by Nominal

Difference between rows within a group

Question

3 answers

solution1
2 ACCPTED 2019-12-10 20:42:03

solution2
2 2019-12-10 20:51:26

solution3
1 2019-12-10 20:52:35

Difference between rows within a group

Question

3 answers

solution1 2 ACCPTED 2019-12-10 20:42:03

solution2 2 2019-12-10 20:51:26

solution3 1 2019-12-10 20:52:35

solution1
2 ACCPTED 2019-12-10 20:42:03

solution2
2 2019-12-10 20:51:26

solution3
1 2019-12-10 20:52:35