简体   繁体   中英

Understanding SQL query for finding item count for each category nested set model

I've been reading this and have understood all the queries until the one for counting items in each category. I have pasted the query and the result here.

    SELECT parent.name, COUNT(product.name)
    FROM nested_category AS node ,
            nested_category AS parent,
            product
    WHERE node.lft BETWEEN parent.lft AND parent.rgt
            AND node.category_id = product.category_id
    GROUP BY parent.name
    ORDER BY node.lft;

    +----------------------+---------------------+
    | name                 | COUNT(product.name) |
    +----------------------+---------------------+
    | ELECTRONICS          |                  10 |
    | TELEVISIONS          |                   5 |
    | TUBE                 |                   2 |
    | LCD                  |                   1 |
    | PLASMA               |                   2 |
    | PORTABLE ELECTRONICS |                   5 |
    | MP3 PLAYERS          |                   2 |
    | FLASH                |                   1 |
    | CD PLAYERS           |                   2 |
    | 2 WAY RADIOS         |                   1 |
    +----------------------+---------------------+

What exactly is going on? There are two copies of the nested_category table named node and parent. In the case of the example, there are 10 items total so there are 10 "nodes" selected since node.category_id = product.category_id . Then there exists at least one parent for each node. But I do not understand the COUNT(product.name) part of the query. How does that give you the item count for each category? I'm not very skilled in sql.

EDIT: I have found a flaw in this code. It will not display categories with zero items. This seems to fix it , but I do not understand it fully either.

Take a look at the GROUP BY clause - this explains the count, I'll run it through:

Ignoring the actual SELECT for a moment, the first part of the FROM clause is the left hand input nested_category which is aliased as 'parent'

This is then joined onto itself nested_category as the right hand input and aliased as node , but the clause gives all rows from the right hand input that have left/right values between the left hand inputs left/right values:

You end up with something like this (selecting all the relevant columns):

Parent.Id    Parent.Left  Parent.Right  Node.Left  Node.Right
1             1            10            2          3
1             1            10            4          9
1             1            10            5          6
1             1            10            7          8
2             4            9             5          6
2             4            9             7          8

Notice how you get the Parent item multiple times (parent * number of children)

This would represent the following hierarchy:

  1/10 + 
       |-> 2/3
       |-> 4/9 +
               |-> 5/6
               |-> 7/8

Essentially for every row in the left input you get all the child rows in the right input

The grouping then kicks in - the GROUP BY clause groups by the left hand input ( Parent.Name which is probably not the best thing to group on unless Parent.Name is unique!) and then counts the number of rows in the right hand input - so in the case of the above it would give you

Parent ID 1 (item 1/10) = 4 rows
Parent ID 2 (item 4/9) = 2 rows

When you GROUP (or aggregate) you can only select items that appear in the GROUP BY clause otherwise you must use an aggregate function such as SUM, COUNT, AVG etc

Edit: The reason it give you none of the categories with 0 items is that the JOIN says 'only return rows from the right input which have left/right values that are between any of the rows in the left input'. Anything that has nothing between the left/right values will be filtered out by the join

Hope this helps explain it

lol @strawberry's comment

Quick answer...your table nested_category is doing a self join to itself to pick out it's 'parent'. Self joins like this are semi common in databases where one record at the 'parent' level will have it's child records in the same table. In some setups, the children items will be in a different table (query looks the same, except it will refer to the different table instead of the same table as we have here). Don't be afraid of self joins...think of it as two completely seperate tables "node" and "parent". Admittadely they have the same structure and same data, but best to think of them as seperate tables.

So first it's grabbing all records as 'nodes' and finding it's parent product (parent products won't have a parent entry, so the join from nodes to parents drop off parent records that have no associated parent record...if that makes sense).

It then joins to the product table on the node.category_id = product.category_id. This will create a line for every product (5 products assoicaited to 1 node will be 5 lines).

The group by statement here takes all those lines and counts them up by the parent level product. It is possible that one parent has 5 nodes which each have 5 products...in this case, it will count all 25 records (5 nodes * 5 products) to that one parent record

Of course, a node with 0 products is dropped by this : node.category_id = product.category_id...and as such it won't appear on the list. To have 0 entries appear, you must first make a list of parent items and then left join it to your statement here...the left join will preserve the 0's.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM