[英]Dopping duplicates from HIVE table, need to write out the dropped records and grab count
[英]selecting records from hive table where there are duplicates with a given criteria
這些條件可以使用優先case
在表達order by
與像函數row_number
。
select A,B,frequency,timekey
from (select t.*
,row_number() over(partition by A order by cast((B = 'unknown') as int), B) as rnum
from tbl t
) t
where rnum = 1
在這里,對於每組A行,我們首先對B = 'unknown'
以外的行進行優先級排序,然后對B
值進行排序。
使用row_number
分析函數。 如果要先選擇unknown
記錄,請使用以下查詢:
select A, B, Frequency, timekey
from
(select
A, B, Frequency, timekey,
row_number() over(partition by A,Frequency order by case when B='unknown' then 1 else 0 end) rn
)s where rn=1
如果要選擇unknown
如果存在),請在上面的查詢中使用以下row_number
:
row_number() over(partition by A,Frequency order by case when B='unknown' then 0 else 1 end) rn
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.