简体   繁体   中英

What are the major differences between S3 lake formation governed tables and databricks delta tables?

What are the major differences between S3 lake formation governed tables and databricks delta tables? they look pretty similar.

Governed tables, Delta Lake, and to some extent also Apache Iceberg and Hudi are all tabular data formats. So instead of storing data in raw formats (parquet, orc, avro) they all have an additional manifest files which provides metadata about which files are present in a table during a certain state. This allows them all to enable features like ACID transactions, time-travel, and snapshotting.

The main difference right now is which big data tools they can integrate with.

AWS Governed tables has tight integration with all of AWS. It can easily leverage the Lake Formation permission model to govern access of data catalog objects (database, table, and column). It also allows you to use AWS query engines: Redshift Spectrum and Athena. Spark is not yet supported.

Delta Lakes provides ACID transactions, time traveling, and snapshotting on Spark. It also supports Spark streaming and data mutation.

What would then be the difference between Glue tables and Governed tables and also with the Hudi, Iceberg and Delta Lake?

Glue tables allow also to query S3 parquet files from Athena, Redshift Spectrum, Glue and from a Spark job.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM