简体   繁体   English

选择依赖于其他列的每个组的最大值

[英]Select max value of each group that depends on other column

 EmpNumber City                                                        Total Sales
 ----------------------------------------------------------------------------------
      1811 Boston                                                      $14557260.03
      1862 Boston                                                      $12435892.06
      1873 Boston                                                       $9786058.60
      1803 Chichago                                                    $18266965.58
      1825 Chichago                                                    $11958100.98
      1877 Chichago                                                    $15569868.52

My table looks like this.我的桌子看起来像这样。 May I know how do I get the best employee from particular city according to their sales?我可以知道我如何根据他们的销售额从特定城市获得最好的员工吗?

Desired output:期望的输出:

EmpNumber City                                                        Total Sales
----------------------------------------------------------------------------------
     1811 Boston                                                      $14557260.03
     1803 Chichago                                                    $18266965.58

I have tried我试过了

select employeenumber, city, max(TotalSales) 
from(
select employeenumber, a.city, sum(quantityordered*priceeach)  as TotalSales
from offices a, employees b, customers c, orders d, orderdetails e
where a.officeCode = b.officeCode
and   b.employeenumber = c.salesrepemployeenumber
and   c.customernumber = d.customernumber
and   d.ordernumber = e.ordernumber
group by employeenumber, a.city
order by a.city)
group by employeenumber, city;

But I still get 3 employees from Boston and 3 employees from Chichago.但我仍然有来自波士顿的 3 名员工和来自芝加哥的 3 名员工。 What I want is only ONE employee from each of the cities.我想要的只是来自每个城市的一名员工。 Thank you谢谢

Just use row_number() analytical function :只需使用row_number()分析函数:

select employeenumber, city, TotalSales 
  from
  (
   select employeenumber, a.city, nvl(quantityordered,0)*nvl(priceeach,0) as TotalSales
          row_number() over 
        ( partition by o.city order by nvl(quantityordered,0)*nvl(priceeach,0) desc ) 
          as rn
     from offices off
     join employees e on off.officeCode = e.officeCode
     join customers c on e.employeenumber = c.salesrepemployeenumber
     join orders ord on c.customernumber = ord.customernumber
     join orderdetails odd on ord.ordernumber = odd.ordernumber
   )
 where rn = 1

If tie(equality of TotalSales) occurs for top values of TotalSales and they should be included in the result, then replace row_number() with dense_rank() which's another analytical function.如果 TotalSales 的最高值出现 tie(equality of TotalSales) 并且它们应该包含在结果中,那么将row_number()替换为dense_rank() ,这是另一个分析函数。

This will get you your desired answer after you created a temp table named tbl from the first dataset you shared above.在您从上面共享的第一个数据集创建一个名为tbl的临时表后,这将为您提供所需的答案。

  select EmpNumber, City, Max_Sales as `Max Sales` from
    (select City, max(`Total Sales`) as `Max_Sales`
      from tbl group by City) a
        left join
    (select `Total Sales` as drop_later, EmpNumber from tbl) b
       on a.Max_Sales = b.drop_later

This is the output in Spark SQL:这是 Spark SQL 中的输出:

   EmpNumber      City    Max Sales
0       1811    Boston  14557260.03
1       1803  Chichago  18266965.58

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM