[英]What are the major differences between S3 lake formation governed tables and databricks delta tables?
What are the major differences between S3 lake formation governed tables and databricks delta tables? S3 湖泊形成管理表和数据块增量表之间的主要区别是什么? they look pretty similar.它们看起来很相似。
Governed tables, Delta Lake, and to some extent also Apache Iceberg and Hudi are all tabular data formats.治理表、Delta Lake,在某种程度上还有 Apache Iceberg 和 Hudi 都是表格数据格式。 So instead of storing data in raw formats (parquet, orc, avro) they all have an additional manifest files which provides metadata about which files are present in a table during a certain state.因此,它们不是以原始格式(parquet、orc、avro)存储数据,它们都有一个额外的清单文件,提供有关在某个 state 期间表中存在哪些文件的元数据。 This allows them all to enable features like ACID transactions, time-travel, and snapshotting.这使他们都可以启用 ACID 事务、时间旅行和快照等功能。
The main difference right now is which big data tools they can integrate with.目前的主要区别在于它们可以集成哪些大数据工具。
AWS Governed tables has tight integration with all of AWS. AWS Governed 表与所有 AWS 紧密集成。 It can easily leverage the Lake Formation permission model to govern access of data catalog objects (database, table, and column).它可以轻松利用 Lake Formation 权限 model 来管理数据目录对象(数据库、表和列)的访问。 It also allows you to use AWS query engines: Redshift Spectrum and Athena.它还允许您使用 AWS 查询引擎:Redshift Spectrum 和 Athena。 Spark is not yet supported.尚不支持 Spark。
Delta Lakes provides ACID transactions, time traveling, and snapshotting on Spark. Delta Lakes 在 Spark 上提供 ACID 事务、时间旅行和快照。 It also supports Spark streaming and data mutation.它还支持 Spark 流和数据变异。
What would then be the difference between Glue tables and Governed tables and also with the Hudi, Iceberg and Delta Lake?那么 Glue 表和 Governed 表以及 Hudi、Iceberg 和 Delta Lake 之间的区别是什么?
Glue tables allow also to query S3 parquet files from Athena, Redshift Spectrum, Glue and from a Spark job. Glue 表还允许从 Athena、Redshift Spectrum、Glue 和 Spark 作业中查询 S3 parquet 文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.