简体   繁体   中英

Basic usage of GROUP BY in SQL

I'm struggling to understand the use of GROUP BY in this query and am looking for clarification:

Flights(flno: integer, from: string, to: string, distance: integer, departs: time, arrives: time, price: real)
Aircraft(aid: integer, aname: string, cruisingrange: integer)
Certified(eid: integer, aid: integer)
Employees(eid: integer, ename: string, salary: integer)

The question is: For all aircraft with cruising range over 1000 miles, find the name of the aircraft and the average salary of all pilots certified for this aircraft.

SELECT Temp.name, Temp.AvgSalary
FROM ( SELECT A.aid, A.aname AS name,
              AVG (E.salary) AS AvgSalary
       FROM Aircraft A, Certified C, Employees E
       WHERE A.aid = C.aid AND
             C.eid = E.eid AND A.cruisingrange > 1000
       GROUP BY A.aid, A.aname ) AS Temp

Why is the GROUP BY necessary here? Wouldn't the following query return the aircraft and the corresponding salary, or would it return the average salary of all employees not specific to each aircraft?

       SELECT A.aname, AVG(E.salary)
       FROM Aircraft A, Certified C, Employees E
       WHERE A.aid = C.aid AND
             C.eid = E.eid AND A.cruisingrange > 1000

Does using GROUP BY change the format of the table so that using GROUP BY A.aid would specify that we are only grouping the aircraft table and leaving the certified and employee tables untouched?

The GROUP BY is required to preform aggregation (in this case, taking the average) properly.

If you don't group by anything, MySQL will preform this aggregation over your entire table. In other words, if you used your last query it will return the average salary for all aircraft with a cruising range over 1000, with no distinction of which aircraft is which. Try it, and you will see this behavior.

However, if you use the GROUP BY clause here, you will see the average for each individual aircraft with a cruising range over 1000, which is what you want. Without it, you're taking the average of all aircraft.

Try these queries on some sample data, and the difference in behavior will become much more clear.


EDIT

Regarding your last few statements: yes, we are not doing anything with the certified or employee table. To step back, the problem states for each aircraft . Many times, if you are given a problem statement that spells out for which group of items you need results, it is a good start to place that as your group by clause.

it is kind of instinct when writing AVG(...) in SQL, use GROUP BY to specify the criteria you want as defining the group of average. withough GROUP BY clause, it will only group all E.salary and make an average.

Anytime you select aggregate functions like AVG, SUM, MAX, MIN, etc... alongside other columns, you must group by all columns which are not aggregate functions or constants. The only exception to this I can think of is when you use windowing functions (not available in MySQL).

In this example, I'm unclear as to why the a.aid is not selected from Temp . If there are two Aircraft with the same name but different ids, you could see results like...

aname   avg
------  -------
747     100,000
747     110,000
DC10     90,000

...where the two records are for different aircraft with the same name (747)

The group by here says to average the salaries per aircraft, giving you the average salaries per aircraft...as in only include salaries for the aircraft you're finding the average for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM