简体   繁体   English

如何从具有相似命名模式的多个 Hive 表中查询数据?

[英]How to query data from multiple Hive tables having a similar naming pattern?

It is my maiden voyage into Hive.这是我进入蜂巢的处女航。 I have multiple Hive tables, like snapshots with names as follows:我有多个 Hive 表,例如名称如下的快照:

revenue_20110131
reveue_20110228
revenue_20110331

purchases_qrt1
purchases_qrt2
purchases_qrt3
purchases_qrt4

I have a lot of such snapshot tables.我有很多这样的快照表。 Now, I need to build a script that takes a part of table name as the parameter and reads the records from all such similarly named tables and exports the entire data from all those tables into a single ORC file.现在,我需要构建一个脚本,该脚本将表名的一部分作为参数,从所有类似命名的表中读取记录,并将所有这些表中的全部数据导出到一个 ORC 文件中。

How to do this in Hive?如何在 Hive 中做到这一点? I have no idea where to start as I've never worked on Hive before.我不知道从哪里开始,因为我以前从未在 Hive 上工作过。 Can someone please help me?有人可以帮帮我吗? Thanks in advance, guys.提前致谢,伙计们。

If the tables have common upper sub-directory in their location, you can create new table using upper directory and select all of them in single select.如果表在其位置具有共同的上层子目录,则可以使用上层目录创建新表并在单选中选择所有表。

create table new tbl 
...
location 'upper common directory path here'

then add these settings before select:然后在选择之前添加这些设置:

set hive.mapred.supports.subdirectories=TRUE;
set mapred.input.dir.recursive=TRUE;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM