简体   繁体   English

S3 湖泊形成管理表和数据块增量表之间的主要区别是什么?

[英]What are the major differences between S3 lake formation governed tables and databricks delta tables?

What are the major differences between S3 lake formation governed tables and databricks delta tables? S3 湖泊形成管理表和数据块增量表之间的主要区别是什么? they look pretty similar.它们看起来很相似。

Governed tables, Delta Lake, and to some extent also Apache Iceberg and Hudi are all tabular data formats.治理表、Delta Lake,在某种程度上还有 Apache Iceberg 和 Hudi 都是表格数据格式。 So instead of storing data in raw formats (parquet, orc, avro) they all have an additional manifest files which provides metadata about which files are present in a table during a certain state.因此,它们不是以原始格式(parquet、orc、avro)存储数据,它们都有一个额外的清单文件,提供有关在某个 state 期间表中存在哪些文件的元数据。 This allows them all to enable features like ACID transactions, time-travel, and snapshotting.这使他们都可以启用 ACID 事务、时间旅行和快照等功能。

The main difference right now is which big data tools they can integrate with.目前的主要区别在于它们可以集成哪些大数据工具。

AWS Governed tables has tight integration with all of AWS. AWS Governed 表与所有 AWS 紧密集成。 It can easily leverage the Lake Formation permission model to govern access of data catalog objects (database, table, and column).它可以轻松利用 Lake Formation 权限 model 来管理数据目录对象(数据库、表和列)的访问。 It also allows you to use AWS query engines: Redshift Spectrum and Athena.它还允许您使用 AWS 查询引擎:Redshift Spectrum 和 Athena。 Spark is not yet supported.尚不支持 Spark。

Delta Lakes provides ACID transactions, time traveling, and snapshotting on Spark. Delta Lakes 在 Spark 上提供 ACID 事务、时间旅行和快照。 It also supports Spark streaming and data mutation.它还支持 Spark 流和数据变异。

What would then be the difference between Glue tables and Governed tables and also with the Hudi, Iceberg and Delta Lake?那么 Glue 表和 Governed 表以及 Hudi、Iceberg 和 Delta Lake 之间的区别是什么?

Glue tables allow also to query S3 parquet files from Athena, Redshift Spectrum, Glue and from a Spark job. Glue 表还允许从 Athena、Redshift Spectrum、Glue 和 Spark 作业中查询 S3 parquet 文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 delta lake 写入 AWS S3(没有 Databricks) - Writing delta lake to AWS S3 (Without Databricks) AWS Lake Formation:s3://abc/ 上的 Lake Formation 权限不足 - AWS Lake Formation: Insufficient Lake Formation permission(s) on s3://abc/ 使用 databricks 在 S3 存储桶中创建 hive 表 - Creating hive tables in S3 bucket using databricks AWS 中带有 HDFS 或 S3 的数据湖有什么区别? - What is the difference between a data lake with HDFS or S3 in AWS? 在 AWS s3 上删除 Delta Lake Partion 的正确方法 - Correct Method to Delete Delta Lake Partion on AWS s3 如何修复 AWS S3 上损坏的 delta 湖表 - How to fix corrupted delta lake table on AWS S3 AWS GLUE 无法在 s3 中编写 Delta Lake - AWS GLUE Not able to write Delta lake in s3 将增量数据从 AWS S3 复制到 Azure Data Lake Storage Gen2 失败 - Copy delta data from AWS S3 to Azure Data Lake Storage Gen2 failed EMR 和 S3 上的 Delta Lake (OSS) 表 - 真空需要很长时间而没有工作 - Delta Lake (OSS) Table on EMR and S3 - Vacuum takes a long time with no jobs java.lang.NoClassDefFoundError:org / apache / spark / sql / catalyst / plans / logical / AnalysisHelper将delta-lake写入s3存储 - java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper while writing delta-lake into s3 storage
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM