简体   繁体   中英

Need help understanding query

This is the schema for the database:

Suppliers(sid:integer, sname:string, address:string)

Parts(pid:integer,pname:string,color:string)

Catalog(sid:integer,pid:integer,cost:real)

Objective: For each part, find the sname of the supplier who charges the most for that part.

Professor's code:

SELECT P.pid, S.sname
FROM Parts P, Suppliers S, Catalog C
WHERE C.pid = P.pid AND C.sid = S.sid
AND C.cost = (SELECT MAX (C1.cost)
                         FROM Catalog C1
                         WHERE C1.pid = P.pid)

Now, in general I am very new to SQL, so I have been struggling trying to understand conceptually how queries work. Looking at the query above, I am confused as to how the subquery works exactly. I know if the subquery was just

SELECT MAX (C1.cost)
FROM Catalog C1

It would simply return the maximum cost in the Catalog table. But this has the condition WHERE C1.pid = P.pid and this is where my mind stops working. We want the maximum cost FOR EACH PART. Conceptually, how does the SQL query know to look at each pid individually? Does the addition of the WHERE clause sort of make everything sort of like a loop (in a regular programming language)? Meaning it will go down the list of pids, find the max cost for each pid and return it to compare with C.cost, then move on to the next pid? Or how exactly is this all happening conceptually? (Something is mentally missing in my mind that helps me understand HOW or WHEN it goes through each id)

I've been asking some very vague questions about SQL lately because for some reason I am struggling a lot more to find...good resources to really understand some of these foundations as compared to other programming languages, and I keep getting people voting to close my questions, but if anyone could at least tell me how I could word my questions better or direct me somewhere to better understand this, I would greatly appreciate it.

My first recommendation would be to NOT try to think about SQL as a programming language. If you start thinking about loops (outside of maybe recursive sql) or if statements (outside of Case statements) you are going to end up in a bad place. Instead think of sets of data. "This part of the query gets this set of data". SQL is a language that creates and works with sets of data.

As for this query, you could put in english like: "Give me the PID and SNAME of the products and their highest cost . Furthermore the Product must be in the Product, Catalog, and Supplier table."

The assumption here is that your Catalog table might have more than one entry per product and the costs may differ between those entries. So we use a correlated subquery to determine which one of those costs is the highest.

I added the but about the product needing to be in all three tables, just to point out that this is using implicit INNER JOIN's. I'm not a big fan of the implicit part of that and always prefer that people write out "INNER JOIN ON ..." in the FROM part of the query. The implicit INNER JOIN is more old-school.

Going back to thinking about this in sets of data. Imagine we join all three tables and return all of the fields. Each product, conceptually, might have more than one record in the result set with differing costs to distinguish one record from the next. So we add the subquery and the restraint that the cost in our recordset needs to be the highest cost we can find for that particular product (the subqueries WHERE clause) in the catalog.

What you have here is a correlated subquery . For every row in the main query, the subquery will be executed, which is why the subquery can reference the main query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM