简体   繁体   中英

Hive Query: Selecting column over a partition based on a median of a different column

I need help in modeling a query as I am unable to do it.

My data is:

id   name   school   height
1    A      S1       10
2    B      S1       12
3    C      S1       14
4    D      S2       15
5    E      S2       16
6    F      S2       17

I want to select the name and the name with median height per school.

Expected output:

id   name  school  myval
1    A    S1    B
2    B    S1    B
3    C    S1    B
4    D    S2    E
5    E    S2    E
6    F    S2    E

Here, person B has the median height in school S1 and E has in S2.

I know we can get median using percentile. But I am not able to figure out how to select the value per partition basis.

below query will work :-

select 
  temp1.id,
  temp1.name,
  temp1.school,
  temp2.name 
from 
  (select 
     id,
     name,
     school,
     height 
  from 
     TABLE_NAME
  ) temp1
  left Join        
   (select 
      school,
      name 
    from 
      (select 
        id,
        name,
        school,
        height,
        SUM(height) OVER 
           (PARTITION BY school)/COUNT(height) OVER 
               (PARTITION BY school) as avg 
      from 
        TABLE_NAME) AVERG 
   where height=avg ) temp2 on temp1.school=temp2.school ;

这给出了中值列

select a.id,a.name,a.school,a.height, b.median from your_table a join (select school, CAST(percentile(CAST(height as BIGINT),0.5) as INT) as median from your_table group by school) b on a.school = b.school;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM