简体   繁体   English

Hive 表在每个日期加载之前重新创建

[英]Hive table re-create before load every date

I saw application are droping external table and creating again then loading the data and runnning msck command every time data load..what is the benefit of this on every time dropping and creating?我看到应用程序正在删除外部表并再次创建然后加载数据并在每次数据加载时运行 msck 命令。每次删除和创建这有什么好处?

There is no benefit in dropping and recreating EXTERNAL table, because dropping table leaves data intact.删除和重新创建EXTERNAL表没有任何好处,因为删除表会使数据保持原样。

Though there may be a benefit in dropping and re-creating MANAGED table because it will drop data as well.尽管删除和重新创建MANAGED表可能有好处,因为它也会删除数据。

One possible scenario if you are running on S3:如果您在 S3 上运行,一种可能的情况是:

Dropping files early before the load completes, not at the time of loading may reduce the possibility of eventual consistency issue in S3 after the load.在加载完成之前提前删除文件,而不是在加载时删除文件可能会降低加载后 S3 中最终一致性问题的可能性。

First of all, when the files dropped, you may hit EC issue (immediately after dropping and during some time) when reading table.首先,当文件删除时,您可能会在读取表时遇到 EC 问题(在删除后立即和一段时间内)。 Early drop of files will speed-up the S3 synchronizing.提前删除文件将加速 S3 同步。

Second, the eventual issue if you writing files with the same name (rewriting).其次,如果您编写具有相同名称的文件(重写),最终的问题。 Early dropping may help, though better to use guid-prefixed(unique) filenames or timestamp in partition folder path or some other similar technics for solving this kind (eventual consistency after rewriting).早期删除可能会有所帮助,但最好使用 guid 前缀(唯一)文件名或分区文件夹路径中的时间戳或其他一些类似的技术来解决这种问题(重写后的最终一致性)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM