繁体   English   中英

Apache Hive查询HiveQL

[英]Apache Hive Query HiveQL

我正在学习Hive,想编写一个优化的HiveQL / SQL查询

我的桌子看起来像这样:

CREATE TABLE sales (dealer VARCHAR(25), make VARCHAR(25), type VARCHAR(25), day INT);
INSERT INTO sales (dealer, make, type, day) VALUES
("Xyz", "Highlander", "SUV", "0"),
("Xyz", "Prius", "HATCH", "1"),
("Xyz", "Prius", "HATCH", "2"),
("Xyz", "Prius", "HATCH", "3"),
("Xyz", "Versa", "HATCH", "1"),
("Xyz", "Versa", "HATCH", "2"),
("Xyz", "Versa", "HATCH", "3"),
("Xyz", "S3", "SEDAN", "1"),
("Xyz", "S3", "SEDAN", "2"),
("Abc", "Forrester", "SUV", "1");

给定一个“经销商” D ,我想在单个查询中计算过去X天中每个“类型”的前N个 “制造”。

SELECT dealer, make, type, COUNT(*) AS frequency FROM sales
WHERE day > 0 AND dealer LIKE 'Xyz' GROUP BY make, type
ORDER BY frequency DESC LIMIT 5

问题是,在前1个“ make”和“ type”上使用GROUP BY时,我只会得到:

DEALER, MAKE, TYPE, COUNT
Xyz, Prius, Hatch, 3
Xyz, Versa, Hatch, 3
Xyz, S3, Sedan, 2
...

但我想要

Xyz, Prius, Hatch, 3
Xyz, S3, Sedan, 2
...

对于每个 “类型”的前N个。

有人可以帮助我了解如何编写这样的查询吗?

SQL小提琴 http://sqlfiddle.com/#!2/df9304/5

****更新****

似乎rank()会很有用:

Hive按查询分组获得前n个记录

https://blogs.oracle.com/taylor22/entry/hive_0_11_may_15

HiveQL和rank()

阅读更多文档和链接问题的提示后:

SELECT dealer, make, rank, type FROM (
    SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
        SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
    ) CountedSales
) RankedSales
WHERE RankedSales.rank < 3;

内部查询进行计数,中间查询执行rank(),外部查询对等级进行限制。

销售表内容

hive> select * from Sales;
OK
Xyz      Highlander      SUV    NULL
Xyz      Highlander      SUV    NULL
Xyz      Rouge   SUV    NULL
Xyz      Rouge   SUV    NULL
Xyz      Prius   HATCH  NULL
Xyz      Prius   HATCH  NULL
Xyz      Prius   HATCH  NULL
Xyz      Versa   HATCH  NULL
Xyz      S3      SEDAN  NULL
Xyz      S3      SEDAN  NULL
Xyz      S3      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Time taken: 0.054 seconds, Fetched: 15 row(s)

现在实际查询。

hive> SELECT dealer, make, rank, type FROM (                                                                          
    >     SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
    >         SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
    >     ) CountedSales
    > ) RankedSales
    > WHERE RankedSales.rank < 3;
...
Execution completed successfully
MapredLocal task succeeded
OK
Xyz      Prius  1        HATCH
Xyz      Versa  2        HATCH
Xyz      A8     1        SEDAN
Xyz      S3     2        SEDAN
Xyz      Rouge  1        SUV
Xyz      Highlander     1        SUV
Time taken: 28.491 seconds, Fetched: 6 row(s)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM