简体   繁体   English

在 Hive 上创建视图时排除某些 S3 文件夹

[英]Exclude certain S3 folders while creating a view on Hive

I am trying to create a view to fetch data from a bucket by excluding certain folders inside S3 on Hive. I was able to successfully create view on Athena, but couldn't do the same on Hive.我正在尝试通过排除 Hive 上 S3 内的某些文件夹来创建一个视图以从存储桶中获取数据。我能够在 Athena 上成功创建视图,但无法在 Hive 上执行相同操作。

Athena View:雅典娜视图:

CREATE VIEW test
as
SELECT *
FROM TABLE_A
WHERE NOT ("$PATH LIKE '%PASSENGER_DATA%')
AND NOT ("$PATH LIKE '%CUSTOMER_DATA%');

Could you please advise how the same could be achieved on Hive?你能告诉我如何在 Hive 上实现同样的目标吗?

There isn't the same facility to filter by path.没有相同的工具可以按路径过滤。 However, depending on what version you are using you could use Ranger to exclude the data so it wasn't shown.但是,根据您使用的版本,您可以使用 Ranger 排除数据,因此它不会显示。

If you must do it by view try using:如果你必须通过视图来尝试使用:

CREATE TABLE filter_out [blah blah blah]
LOCATION '%CUSTOMER_DATA%'

SELECT *
FROM TABLE_A
WHERE NOT EXISTS (SELECT ID FROM filter_out WHERE TABLE_A.ID = filter_out.ID)

You may actually wish to consider moving the data into its own folders.您实际上可能希望考虑将数据移动到其自己的文件夹中。 Then you could build them as a table with:然后你可以将它们构建为一个表:

CREATE VIEW TABLE_DATA
as
SELECT *
FROM TABLE_A --("$PATH LIKE '%PASSENGER_DATA%')
UNION
SELECT *
FROM TABLE_B -- ("$PATH LIKE '%CUSTOMER_DATA%');

THis likely will also make your permission issues easier to manage.这可能还会使您的权限问题更易于管理。

And when needed you could easily use one table or both tables.在需要时,您可以轻松地使用一个表或两个表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM