I have a partitioned table in hive. The schema and a sample is shown below
item_id | price | brand | partition_id
AX_12 340.22 Apple. 356
AZ_47 230.00 Samsung 357
AX_12 321.00. Apple. 357
AQ_17. 125.00 Lenovo. 356
If an item is present in multiple partitions. I need to select the row with latest partition So the expected output for this example is this
item_id | price | brand | partition_id
AX_12 321.00 Apple. 357
AZ_47 230.00 Samsung 357
AQ_17. 125.00 Lenovo. 356
There are 10 partitions in the table and each partition has 10 million rows
You can use window functions to filter on the top record per group:
select t.*
from (
select t.*, row_number() over(partition by item_id order by partition_id desc) rn
from mytable t
)
where rn = 1
A typical alternative is to filter with a correlated subquery:
select t.*
from mytable t
where t.partition_id = (
select max(t1.partition_id) from mytbale t1 where t1.item_id = t.item_id
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.