[英]Apache Hive Query HiveQL
我正在学习Hive,想编写一个优化的HiveQL / SQL查询
我的桌子看起来像这样:
CREATE TABLE sales (dealer VARCHAR(25), make VARCHAR(25), type VARCHAR(25), day INT);
INSERT INTO sales (dealer, make, type, day) VALUES
("Xyz", "Highlander", "SUV", "0"),
("Xyz", "Prius", "HATCH", "1"),
("Xyz", "Prius", "HATCH", "2"),
("Xyz", "Prius", "HATCH", "3"),
("Xyz", "Versa", "HATCH", "1"),
("Xyz", "Versa", "HATCH", "2"),
("Xyz", "Versa", "HATCH", "3"),
("Xyz", "S3", "SEDAN", "1"),
("Xyz", "S3", "SEDAN", "2"),
("Abc", "Forrester", "SUV", "1");
给定一个“经销商” D ,我想在单个查询中计算过去X天中每个“类型”的前N个 “制造”。
SELECT dealer, make, type, COUNT(*) AS frequency FROM sales
WHERE day > 0 AND dealer LIKE 'Xyz' GROUP BY make, type
ORDER BY frequency DESC LIMIT 5
问题是,在前1个“ make”和“ type”上使用GROUP BY时,我只会得到:
DEALER, MAKE, TYPE, COUNT
Xyz, Prius, Hatch, 3
Xyz, Versa, Hatch, 3
Xyz, S3, Sedan, 2
...
但我想要
Xyz, Prius, Hatch, 3
Xyz, S3, Sedan, 2
...
对于每个 “类型”的前N个。
有人可以帮助我了解如何编写这样的查询吗?
SQL小提琴 http://sqlfiddle.com/#!2/df9304/5
****更新****
似乎rank()会很有用:
阅读更多文档和链接问题的提示后:
SELECT dealer, make, rank, type FROM (
SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
) CountedSales
) RankedSales
WHERE RankedSales.rank < 3;
内部查询进行计数,中间查询执行rank(),外部查询对等级进行限制。
销售表内容
hive> select * from Sales;
OK
Xyz Highlander SUV NULL
Xyz Highlander SUV NULL
Xyz Rouge SUV NULL
Xyz Rouge SUV NULL
Xyz Prius HATCH NULL
Xyz Prius HATCH NULL
Xyz Prius HATCH NULL
Xyz Versa HATCH NULL
Xyz S3 SEDAN NULL
Xyz S3 SEDAN NULL
Xyz S3 SEDAN NULL
Xyz A8 SEDAN NULL
Xyz A8 SEDAN NULL
Xyz A8 SEDAN NULL
Xyz A8 SEDAN NULL
Time taken: 0.054 seconds, Fetched: 15 row(s)
现在实际查询。
hive> SELECT dealer, make, rank, type FROM (
> SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
> SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
> ) CountedSales
> ) RankedSales
> WHERE RankedSales.rank < 3;
...
Execution completed successfully
MapredLocal task succeeded
OK
Xyz Prius 1 HATCH
Xyz Versa 2 HATCH
Xyz A8 1 SEDAN
Xyz S3 2 SEDAN
Xyz Rouge 1 SUV
Xyz Highlander 1 SUV
Time taken: 28.491 seconds, Fetched: 6 row(s)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.