简体   繁体   中英

Selecting distinct values from database

I have a table as follows:

ParentActivityID | ActivityID | Timestamp

 1                A1           T1
 2                A2           T2
 1                A1           T1
 1                A1           T5

I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.

I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.

DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY :

SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp

Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For eg , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].

You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN :

SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID

"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:

SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID

"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).

Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min() , max() , first() or last() aggregate function to work around it.

Try this:

SELECT [ParentActivityId],
       MIN([Timestamp]) AS [FirstTimestamp],
       MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]

This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.

Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .

;With cte_parent
 As
   (SELECT ParentActivityId,ActivityId,TimeStamp
          , ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
    FROM YourTable )

 SELECT *
 FROM cte_parent 
 WHERE RNO =1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM