简体   繁体   中英

How to select rows based on two columns creating an identifier and the max date

I have six columns. One of the six columns I created myself. It is two of the columns put together to create an identifier column. I want to select only the max date row for each distinct combination of the identifier column. When I omit the quantity column, I get the expected number of rows. However, once I add in quantity it gives me rows I don't expect. How do I select only the max date rows for each distinct occurrence of my Identifier column?

For example, when I run this query...

Select 
 Distinct(L.ItemNo+' 'L.Lot) as Identifier
 ,Max(L.PostingDate)
 ,L.ItemNo
 ,L.Description
 ,L.Quantity
 ,L.Lot
From dbo.JournalLine L
Groupy by
 L.ItemNo
 ,L.Lot
 ,L.Description
 ,L.Quantity

I get the below results. The row here that I am not expecting is the row with 45 Apples.

Identifier PostingDate ItemNo Description Quantity Lot
I123 LOT123 2021-06-01 I123 Celery 79 L123
I456 LOT456 2021-06-01 I456 Carrot 25 L456
I456 LOT654 2021-06-01 I654 Carrot 21 L654
I789 LOT789 2021-05-28 I789 Apple 45 L789
I789 LOT789 2021-06-01 I789 Apple 38 L789
I789 LOT555 2021-06-01 I789 Apple 11 L555

Use window functions MAX() and FIRST_VALUE() to get the values of PostingDate and Quantity respectively of the row with the latest PostingDate :

SELECT DISTINCT
       ItemNo + ' ' + Lot AS Identifier,
       MAX(PostingDate) OVER (PARTITION BY ItemNo, Lot, Description) AS PostingDate,
       ItemNo,
       Description,
       FIRST_VALUE(Quantity) OVER (PARTITION BY ItemNo, Lot, Description ORDER BY PostingDate DESC) AS Quantity,
       Lot
FROM dbo.JournalLine

The GROUP BY clause will gather all of the rows together that contain data in the specified columns (here the GroupBy operation is performed on columns L.ItemNo , L.Lot , L.Description and L.Quantity ) and will allow aggregate functions to be performed on the one or more columns( Here the aggregation is done on L.PostingDate ).

So each record with a distinct combination of the group by columns ie L.ItemNo , L.Lot , L.Description and L.Quantity will come along with the aggregation on the duplicate combinations of it. For instance, considering your example: -

Let us assume that you have below records in your table:-

Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT789 2021-05-27  I789    Apple       45          L789
I789 LOT789 2021-05-29  I789    Apple       38          L789
I789 LOT789 2021-05-25  I789    Apple       45          L789
I789 LOT789 2021-05-28  I789    Apple       45          L789
I789 LOT789 2021-06-01  I789    Apple       38.         L789
I789 LOT555 2021-06-01  I789    Apple       11.         L555

So when you do a group by on Identifier , ItemNo , Description , Quantity and Lot , SQL will understand that you want to do a logical grouping of these columns and it will divide data and understand it as below:-

-- Group 1
Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT789 2021-05-27  I789    Apple       45          L789
I789 LOT789 2021-05-25  I789    Apple       45          L789
I789 LOT789 2021-05-28  I789    Apple       45          L789

-- Group 2
Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT789 2021-05-29  I789    Apple       38          L789
I789 LOT789 2021-06-01  I789    Apple       38.         L789

-- Group 3
Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT555 2021-06-01  I789    Apple       11.         L555

Now when you have any aggregate function run on these records on some particular column, it will try to run the aggregate function for each logical grouping it has encountered ( There are 3 in this case as we saw above)

So in our case, the aggregate function is Max(L.PostingDate) which will select the record with the maximum date for each group it has identified till now like below:-

-- Group 1
Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT789 2021-05-28  I789    Apple       45          L789

-- Group 2
Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT789 2021-06-01  I789    Apple       38.         L789

-- Group 3
Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT555 2021-06-01  I789    Apple       11.         L555

And now finally it combines all records and shows the result as below:-

-- Final Ouput
Identifier  PostingDate ItemNo  Description Quantity    Lot
I789 LOT789 2021-05-28  I789    Apple       45          L789
I789 LOT789 2021-06-01  I789    Apple       38.         L789
I789 LOT555 2021-06-01  I789    Apple       11.         L555

This is how group by functions, so in your case, if you don't want the group by to happen on quantity columns, you can simply remove it from select query as well as groupBy, or else if you include the quantity column to group by it will keep printing the records with the distinct column combination.

you can use window function:

select * from (
  select * , row_number() over (partition by ItemNo,Lot order by PostingDate desc) rn
  from dbo.JournalLine
) l
where rn = 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM