Using the row_number() in the hive SQL I can filter the duplicates/pick the first instance of an id by selecting 1 in the where clause as below. What I need here is how can I find the last instance in each group.
select * from
(select c1,c2,c3,c4,c5,id, row_number() over(partition by id ORDER BY id) as seq
from
table) as cnt where seq = 1;
My requirement is, for example, if the id 1212 has 3 instances and 1313 has 5 instances in the table something like below, I can use the above query and get only one instance by selecting 1 in the where clause. But I want 3 for the id 1212 and 5 for the id 1313 in the below.
c1, c2, c3, c4, c5, ID seq
2020 2020 2020 2020 2020 1212 1
2021 2020 2021 2020 2021 1212 2
2022 2020 2022 2020 2022 1212 3
2023 2020 2023 2020 2023 1313 1
2024 2020 2024 2020 2024 1313 2
2025 2020 2025 2020 2025 1313 3
2026 2020 2026 2020 2026 1313 4
2026 2020 2026 2020 2026 1313 5
Add an extra column with COUNT(*) OVER (PARTITION BY id) AS cnt
. That will contain the number of rows in the group which is also the maximum ROW_NUMBER value for the group as well.
select id,max(seq) over(partition by id ORDER BY id)from
(select *, row_number() over(partition by id ORDER BY id) as seq
from
table)maxseq
group by id
Use all those columns in the group by
and use max
on the row_number()
select c1,c2,c3,c4,c5,id,max(r_no)
from
(
select c1,c2,c3,c4,c5,id, row_number() over (partition by id ORDER BY c1,c2,c3,c4,c5,id) as r_no
from
table
) a
group by c1,c2,c3,c4,c5,id
Change the ascending sort to a descending sort:
select t.*
from (select c1, c2, c3, c4, c5, id,
row_number() over (partition by id ORDER BY id desc) as seqnum
------------------------------------------------------------^
from table
) t
where seqnum = 1;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.