简体   繁体   English

Hive如何一次查询多张表?

[英]how to query many tables in one shot in Hive?

I have to query and then "union" many tables.我必须查询然后“联合”许多表。 I did manually in Hive but wondering if there's a more optimal (shorter) way to do it.我在 Hive 中手动进行了操作,但想知道是否有更优化(更短)的方法。

We have tables for each month, so instead of doing this for a whole year:我们每个月都有表格,所以不要这样做一整年:

create table t_2019 as
select * from
(select * from t_jan where...
union all
select * from t_feb where...
union all
select * from t_mar where...);

Does Hive (or any kind of SQL) allow to loop through tables? Hive(或任何类型的 SQL)是否允许遍历表? I've seen for loop and while examples in T-SQL, but they are individual queries.我在 T-SQL 中见过 for 循环和 while 示例,但它们是单独的查询。 In this case I want to union the tables.在这种情况下,我想合并表。

@t_list = ('t_jan', 't_feb', 't_mar'...etc)

Then, how to query each table in @t_list and "union all"?那么,如何查询@t_list 和“union all”中的每张表呢? Each month has about 800k rows, so it's big but Hive can handle.每个月大约有 800k 行,所以它很大,但 Hive 可以处理。

You can solve this problem with partitioned hive table instead of multiple tables.您可以使用分区的 hive 表而不是多个表来解决此问题。

Ex: table_whole pointing to hdfs path hdfs://path/to/whole/ with partitions on Year and Month例如: table_whole 指向 hdfs 路径 hdfs://path/to/whole/ 分区为年和月

Now you can query to get data from  all months in 2019 using
select * from table_whole where year = '2019' 


If you need just data from one month say Jan in 2019. you can filter by that partition 
select * from table_whole where year = '2019' and month='JAN'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM