NESTED SELECT, UNION, LEFT JOIN

Question

I have a query involving three tables:

Employee
Attendance
Category

where

the PK of Employee is Id ;
the PK of Category is Staff_id ;
the PK of Attendance is attendance_Id ;
Employee has a foreign key Staff referencing Category.Staff_id ;
Attendance has a foreign key Id referencing Employee.Id

I need to modify my query to provide an additional column position drawn from a fourth table, Position , and to group the results by Position.position and Employee.Staff . I cannot modify any table's structure or content.

Result rows should look like this, where "Driver" corresponds to Staff = 2 :

Position | TotalEmp | TotalAttendance | TimeIn | TimeOut

 Driver           5               5               8.00am       6.00pm

Here is my current query:

SELECT D.TotalEmp, D.TotalAttendance, D.Timein, D.TimeOut
FROM (
  SELECT B.TotalEmp, B.TimeIn, B.TimeOut FROM (
    SELECT
      (SELECT COUNT (distinct Id) FROM Employee WHERE Staff = 2) AS TotalEmp,
      (
        SELECT COUNT(id)
        FROM Attendance Q
        WHERE
          id IN (SELECT (Id) FROM Employee WHERE Staff = 2) 
          AND  CONVERT(datetime, CONVERT(nvarchar(10), Q.timeInDate, 103), 103) = '20/11/2014'
      ) AS TotalAttendance, 
      (
        SELECT MIN(CONVERT(VARCHAR(8),I.timeInDate,108))
        FROM Attendance I
        WHERE
          CONVERT(datetime, CONVERT(nvarchar(10), I.timeInDate, 103), 103) = '20/11/2014'
          AND I.id IN (SELECT (Id) FROM Employee WHERE Staff = 2)
      ) Timein,
      (
        SELECT
          MAX(CONVERT(VARCHAR(8),O.timeOutDate,108))
        FROM Attendance O
        WHERE
          CONVERT(datetime, CONVERT(nvarchar(10), O.timeOutDate, 103), 103) = '20/11/2014'
          AND O.id IN (SELECT (Id) FROM Employee WHERE Staff = 2)
      ) TimeOut
    FROM Employee
    WHERE Id IN (SELECT (id) FROM Attendance) 
  ) B 

  UNION

  SELECT C.TotalEmp, C.Time, C.TimeOut FROM (
    SELECT
      (SELECT COUNT (distinct Id) FROM Employee WHERE Staff = 1) AS TotalEmployee, 
      ( 
        SELECT COUNT(id)
        FROM Attendance R
        WHERE
          id IN (SELECT (Id) FROM Employee WHERE Staff = 1) 
          AND CONVERT(datetime, CONVERT(nvarchar(10), R.timeInDate, 103), 103) = '20/11/2014'
      ) AS TotalAttendance,
      (
        SELECT MIN(CONVERT(VARCHAR(8), T.timeInDate, 108))
        FROM Attendance T
        WHERE
          CONVERT(datetime, CONVERT(nvarchar(10), T.timeInDate, 103), 103) = '20/11/2014'
          AND T.id IN (SELECT (Id) FROM Employee WHERE Staff = 1)
      ) Timein,
      (
        SELECT MAX(CONVERT(VARCHAR(8),X.timeOutDate,108))
        FROM Attendance X
        WHERE
          CONVERT(datetime, CONVERT(nvarchar(10), X.timeOutDate, 103), 103) = '20/11/2014'
          AND X.id IN (SELECT (Id) FROM Employee WHERE Staff = 1)
      ) TimeOut
    FROM Employee
    WHERE Id IN (SELECT (id) FROM Attendance) 
  ) C
) D

GROUP BY D.TotalEmp, D.TotalAttendance, D.Timein, D.TimeOut

How can I modify my query to produce the required result?

Answer 1

I hope you'll forgive me for saying that your original original query is pretty horrible. It uniformly performs subqueries where joins would be more appropriate, and it has multiple subqueries that beg to be factored out as common table expressions, or even simply as top-level aggregates. It also expresses some WHERE predicates that are wholly redundant with the foreign-key constraints on the base tables. And it uses opaque table aliases instead of meaningful ones.

The original query also has some very suspicious structure:

subqueries C and D each select from table Employee , but none of the selected columns actually come from that table. All are the results of uncorrelated aggregate (sub)queries, so subqueries C and D will each provide as many rows as there are Employee rows, all identical (per-subquery). Then all those unneeded duplicates are removed again when the UNION operator eliminates duplicate rows.
you have a GROUP BY clause on the outermost query, but no aggregate functions in that query's selection list. Perhaps you wanted to ORDER BY those columns instead, but if not then the GROUP BY is altogether useless.
You are converting dates to strings to compare them; that's not necessarily wrong for equality comparisons, but it's inefficient. It is wrong for greater-than and less-than comparisons, however, and therefore it's also wrong for use with MIN() and MAX() . It will work well enough, though, to fool you by producing the right results in some cases.
You perform a UNION of two subqueries with identical structure, differing only in some query predicates. This begs to be combined into a single query.

It will surely help to start by simplifying the original query. It looks like this will produce the same data, except with a Staff column added and possibly in a different order:

SELECT
  emp.Staff,
  COUNT(DISTINCT emp.id) AS TotalEmp,
  COUNT(DISTINCT att.id) AS TotalAttendance,
  MIN(att.timeInDate) AS TimeIn,
  MAX(att.timeOutDate) AS TimeOut,
FROM
  Employee emp
  LEFT JOIN Attendance att ON att.Id = emp.Id
WHERE 
  CAST(att.timeInDate AS DATE) = CONVERT(DATE, '20/11/2014', 103)
  AND (emp.Staff = 1 OR emp.Staff = 2)
GROUP BY emp.Staff

Note that it does group by Staff ; this eliminates the need for a UNION , while still preserving the per-Staff aggregate values (indeed, that's the whole point of GROUP BY ). Note, too, that if 1 and 2 are the only possible values for Employee.Staff , or if you're ok with getting results for other values, too, then you can simplify further by removing the the WHERE condition restricting the results to only those values.

Note also that your Datetime values are converted to Date to strip off the time portion; this is much more efficient than formatting them as strings. Your literal date string is converted to a Date for comparison (using format 103).

That serves as a much better starting point, as the structure of the data and the nature of the grouping are clear. And it is so much simpler! Now if you want to split the groups differently, it's pretty easy to do so.

In particular, something like this should do what you want:

SELECT
  pos.position AS position,
  COUNT(DISTINCT emp.id) AS TotalEmp,
  COUNT(DISTINCT att.id) AS TotalAttendance,
  MIN(att.timeInDate) AS TimeIn,
  MAX(att.timeOutDate) AS TimeOut,
FROM
  Employee emp
  JOIN Position pos ON emp.position_id = pos.positionId
  LEFT JOIN Attendance att ON att.Id = emp.Id
WHERE
  CAST(att.timeInDate AS DATE) = CONVERT(DATE, '20/11/2014', 103)
GROUP BY pos.position

That relies on the fact that each position is associated with exactly one Staff value, so that it gains nothing to group by Staff as well.

NESTED SELECT, UNION, LEFT JOIN

Question

Position | TotalEmp | TotalAttendance | TimeIn | TimeOut

1 answers

solution1
0 ACCPTED 2015-03-13 16:31:04

NESTED SELECT, UNION, LEFT JOIN

Question

Position | TotalEmp | TotalAttendance | TimeIn | TimeOut

1 answers

solution1 0 ACCPTED 2015-03-13 16:31:04

solution1
0 ACCPTED 2015-03-13 16:31:04