简体   繁体   中英

Filtering out anything older than a year before the MAX() date of grouped rows

I'd begun posting something about this a few days ago when I thought I may've found a good solution thanks to reading a few of the suggested posts. Unfortunately we found out today that there are some data gaps. As far as I can tell it's owing to the query.

I'm attempting to grab up a year's worth of spend data for items. If they were purchased multiple times throughout the year then I group on them and SUM() their quantities. The idea is to get a single row with all the needed info instead of 2, 10, or 100 based on each and every transaction that occurred.

The catch is that I don't want only a year's worth of data from today. I want year's worth that ends in the MAX(date) for that company's batch of spend. In other words, maybe a product was last purchased 3 months ago, but their last known transaction/PO was from only 1 month ago. The dates related to our data then should be everything from 1 year prior to 1 month ago

Here is what I've currently got:

SELECT id, groupId, datePurchase, vendor, venItem, item, itemDesc, uom, SUM(qtyPurchase) AS `qtyPurchase`, SUM(extPrice) AS `totalPrice`, MAX(unitPrice) AS `maxPrice`, unitPrice, code, stripped_venitem, MAX(datePurchase) as `maxDate`, SUM(extPrice) as `extPrice`, PONum
FROM transactions t
WHERE (((qtyPurchase)!=0))
GROUP BY groupId, vendor, venItem, item, itemDesc, uom
HAVING datePurchase > (MAX(datePurchase) - INTERVAL 1 YEAR);

If I remove the HAVING portion then the data in question shows up. It's not that it's date is less than the prescribed year, though. When I remove the HAVING clause and find the product we're having a problem with, the maxDate value is "2015-08-26"

This is all part of an import to Solr so I need to get it right within a single query, no extra tricks or processing. Thanks for any insight!


EDIT 1: It's worth noting, too, that I cannot have the HAVING datePurchase > (MAX(datePurchase) - INTERVAL 1 YEAR) portion migrated to part of the WHERE clause. If injected directly it throws an error of invalid use of group function and if it is added as a subquery ( ...AND datePurchase > (SELECT MAX(datePurchase)... ) then it uses the value of MAX(datePurchase) corresponding to the entire dataset, not the particular groupId's that it should be related to.

That is worth stressing as well: the max date should be related to each particular entity's (listed as groupId ) batch of transactions, not the entire table.

I'm guessing it was an arithmetic meets dates issue. I changed the HAVING datePurchase > (MAX(datePurchase) - INTERVAL 1 YEAR) to be HAVING datePurchase > (MAX(datePurchase) - DATE_SUB(MAX(datePurchase), INTERVAL 1 YEAR)) and voila!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM