简体   繁体   English

DROP PARTITION 是否从 HIVE 中的外部表中删除数据?

[英]Does DROP PARTITION delete data from external table in HIVE?

An external table in HIVE is partitioned on year, month and day. HIVE 中的外部表按年、月和日进行分区。

So does the following query delete data from external table for the specific partitioned referenced in this query?:-那么以下查询是否会从此查询中引用的特定分区的外部表中删除数据?:-

ALTER TABLE MyTable DROP IF EXISTS PARTITION(year=2016,month=7,day=11);

Partitioning scheme is not data.分区方案不是数据。 Partitioning scheme is part of table DDL stored in metadata (simply saying: partition key value + location where the data-files are being stored).分区方案是存储在元数据中的表 DDL 的一部分(简单地说:分区键值 + 存储数据文件的位置)。

Data itself are stored in files in the partition location(folder).数据本身存储在分区位置(文件夹)中的文件中。 If you drop partition of external table, the location remain untouched, but unmounted as partition (metadata about this partition is deleted).如果删除外部表的分区,该位置保持不变,但作为分区卸载(删除有关此分区的元数据)。 You can have few versions of partition location unmounted (for example previous versions).您可以卸载几个版本的分区位置(例如以前的版本)。

You can drop partition and mount another location as partition ( alter table add partition ) or change existing partition location.您可以删除分区并将另一个位置挂载为分区( alter table add partition )或更改现有分区位置。 Also drop external table do not delete table/partitions folders with files in it.同时删除外部表不要删除其中包含文件的表/分区文件夹。 And later you can create table on top of this location.稍后您可以在此位置的顶部创建表。

Have a look at this answer for better understanding external table/partition concept: It is possible to create many tables (both managed and external at the same time) on top of the same location in HDFS.查看此答案以更好地理解外部表/分区概念:可以在 HDFS 的同一位置上创建多个表(同时管理和外部)。

No external table have only references that will be deleted actual file will still persists at location .没有外部表只有将被删除的引用,实际文件仍将保留在 location 。

External Table data files are not owned by table neither moved to hive warehouse directory外部表数据文件不归表所有,也不移动到 hive 仓库目录

Only PARTITION meta will be deleted from hive metastore tables..只有 PARTITION meta 会从 hive Metastore 表中删除。

Difference between Internal & external tables :内表与外表的区别:

For External Tables -对于外部表 -

External table stores files on the HDFS server but tables are not linked to the source file completely.外部表将文件存储在 HDFS 服务器上,但表并未完全链接到源文件。

If you delete an external table the file still remains on the HDFS server.如果删除外部表,该文件仍保留在 HDFS 服务器上。

As an example if you create an external table called “table_test” in HIVE using HIVE-QL and link the table to file “file”, then deleting “table_test” from HIVE will not delete “file” from HDFS.例如,如果您使用 HIVE-QL 在 HIVE 中创建一个名为“table_test”的外部表并将该表链接到文件“file”,那么从 HIVE 中删除“table_test”不会从 HDFS 中删除“file”。

External table files are accessible to anyone who has access to HDFS file structure and therefore security needs to be managed at the HDFS file/folder level.任何有权访问 HDFS 文件结构的人都可以访问外部表文件,因此需要在 HDFS 文件/文件夹级别管理安全性。

Meta data is maintained on master node and deleting an external table from HIVE, only deletes the metadata not the data/file.元数据在主节点上维护,从 HIVE 中删除外部表,只删除元数据而不删除数据/文件。

For Internal Tables-对于内部表-

Stored in a directory based on settings in hive.metastore.warehouse.dir, by default internal tables are stored in the following directory “/user/hive/warehouse” you can change it by updating the location in the config file .存储在基于 hive.metastore.warehouse.dir 设置的目录中,默认情况下内部表存储在以下目录“/user/hive/warehouse”中,您可以通过更新配置文件中的位置来更改它。 Deleting the table deletes the metadata & data from master-node and HDFS respectively.删除表会分别从主节点和 HDFS 中删除元数据和数据。 Internal table file security is controlled solely via HIVE.内部表文件安全仅通过 HIVE 控制。 Security needs to be managed within HIVE, probably at the schema level (depends on organisation to organisation).安全需要在 HIVE 内进行管理,可能在模式级别(取决于组织到组织)。

Hive may have internal or external tables this is a choice that affects how data is loaded, controlled, and managed. Hive 可能有内部或外部表,这是一个影响数据加载、控制和管理方式的选择。

Use EXTERNAL tables when:在以下情况下使用 EXTERNAL 表:

The data is also used outside of Hive.数据也在 Hive 之外使用。 For example, the data files are read and processed by an existing program that doesn't lock the files.例如,数据文件由不锁定文件的现有程序读取和处理。 Data needs to remain in the underlying location even after a DROP TABLE.即使在 DROP TABLE 之后,数据也需要保留在底层位置。 This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas.如果您将多个模式(表或视图)指向单个数据集,或者您正在遍历各种可能的模式,则这适用。 Hive should not own data and control settings, dirs, etc., you may have another program or process that will do those things. Hive 不应拥有数据和控制设置、目录等,您可能有另一个程序或进程来执行这些操作。 You are not creating table based on existing table (AS SELECT).您不是基于现有表 (AS SELECT) 创建表。

Use INTERNAL tables when:在以下情况下使用内部表:

The data is temporary.数据是临时的。 You want Hive to completely manage the life-cycle of the table and data.您希望 Hive 完全管理表和数据的生命周期。

Note: Meta table if you will look in to the database ( configured details )注意:如果您要查看数据库,请使用元表(配置的详细信息

|BUCKETING_COLS      |
| COLUMNS            |
| DBS                |
| NUCLEUS_TABLES     |
| PARTITIONS         |
| PARTITION_KEYS     |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS   |
| SDS                |
| SD_PARAMS          |
| SEQUENCE_TABLE     |
| SERDES             |
| SERDE_PARAMS       |
| SORT_COLS          |
| TABLE_PARAMS       |
| TBLS               | 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM