简体   繁体   中英

Left Outer Join with Missing Records

I'm selecting data from several tables, but the main idea is that a product may or may not have a discount record associated with it, as either a percent off or dollar off amount. I'm using a left outer join (which may be incorrect) and am getting back the same values for dollar and percent values, regardless as to whether or not the records exist.

The query looks something like:

SELECT Items.ItemID, Items.name, Items.price, 
       ItemDiscounts.percentOff, ItemDiscounts.dollarOff, 
       ItemAttributes.ColorName, ItemStuff.StuffID
FROM Items, ItemAttributes, ItemStuff
LEFT OUTER JOIN ItemDiscounts
  ON ItemDiscounts.ItemID = ItemID
  AND (
    ItemDiscounts.percentOff > 0
    OR ItemDiscounts.dollarOff > 0
  )
WHERE Items.ItemID = ItemAttributes.ItemID
AND ItemStuff.ItemID = Items.ItemID
GROUP BY ItemStuff.StuffID

The weird part is that in all results, percentOff returns "1", and dollarOff returns "0", regardless if each item has it's own associated discount record. For spits, I changed ItemDiscounts.percentOff > 0 to ItemDiscounts.percentOff > 1 , then dollarAmount changed to all 2's and percentOff was all 0's.

I'm somewhat baffled on this, so any help would be appreciated.

You have an unqualified reference to ItemID in your ON clause... not clear why that's not throwing an "ambiguous column" exception. (Apparently, it's not ambiguous to MySQL, and MySQL is determining which ItemId is being referenced, the odds are good that its not the one you intended.

Also, your query includes references to the ItemStuff rowsource, but there is no such rowsource shown in your query.

I also suspect that the behavior of the GROUP BY that is giving you a result set that doesn't meet your expectation. (More than likely, right now, it's masking the real problem in your query, which could be a CROSS JOIN operation that you didn't intend.

I suggest you try your query without the GROUP BY clause, and confirm the resultset is what you would expect absent the GROUP BY clause.

NOTE: Most other relational database engines will throw an exception with a GROUP BY like you show in your query. They (basically) require that every non-aggregate in the SELECT list be included in the GROUP BY . You can get MySQL to behave the same way (with some particular settings of sql_mode.) MySQL is more liberal, but the result set you get back may not conform to your expectation.

NOTE: I don't see how this query is passing a semantics check, and is returning any resultset at all, given the references to a non-existent ItemStuff rowsource.


For improved readability, I recommend that you not use the comma as the join operator, and that you instead use the JOIN keyword. I also recommend that you move the join predicates from the WHERE clause to an ON clause. I also prefer to give an alias to each rowsource, and use that alias to qualify the columns from it.

Given what you show in your query, I'd write (the parts I can make sense of) like this:

SELECT i.ItemID
     , i.name
     , i.price
     , d.percentOff
     , d.dollarOff
     , a.ColorName
  FROM Items i 
  JOIN ItemAttributes a
    ON a.ItemID = i.ItemID
  LEFT 
  JOIN ItemDiscounts d
    ON d.ItemID = i.ItemID
       AND ( d.percentOff > 0 OR d.dollarOff > 0 )

I've omitted ItemStuff.StuffID from the SELECT list, because I don't see any ItemStuff rowsource.

I also exclude the WHERE clause, because I don't see any ItemStuff rowsource in your query.

-- WHERE ItemStuff.ItemID = i.ItemID

I omit the GROUP BY because, again, I don't see any ItemStuff rowsource in your query, and because the behavior of the GROUP BY is likely not what I expect, but is rather masking a problem in my query.

-- GROUP BY ItemStuff.StuffID

@Kyle, the fact your query "timed out" leads me to believe you are generating WAY MORE rows than you expect, like you have a Cartesian product (every row from a table is being "matched" to every row in some other table... 10,000 rows in one table, 10,000 rows in the other table, that will generate 100,000,000 rows.

I think the GROUP BY clause is masking the real problem.

I recommend that for development, you include the PRIMARY KEY of each table as the leading columns in your result set. I would add some reasonable predicates to the driving table (eg i.ItemID IN (2,3,5,7) to limit the size of the result set, and ORDER BY the primary key... that should help you identify an unintended Cartesian product.

Do you get what you want when you remove these lines from your query?

AND ( ItemDiscounts.percentOff > 0 OR ItemDiscounts.dollarOff > 0 )

Once you specify an absolute value for the possibly-null side of an outer join, your WHERE clause has to account for it.

Try it with the following clause:

AND (
ItemDiscounts.percentOff > 0
OR ItemDiscounts.percentOff is null
OR ItemDiscounts.dollarOff > 0
OR ItemDiscounts.dollarOff is null
)

Additionaly, you're specifying a GROUP BY without an aggregate. This makes no sense in most cases. You probably want ORDER BY to sort.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM