简体   繁体   中英

Are flatfiles(orc,csv) more efficient than delta table in spark

I am working with around 16 delta tables with around 1 to 3 million rows in databricks.
So when I am trying to perform an operation like join and then delete or insert in these delta tables it is taking a long time.
I have to do mostly insert and delete operations. So should I use flatfiles instead of delta tables. Or should i try merge with delta tables insted.

Hence I had doubts about what are the advantages of delta and why not use flat files?

Its a basic question but I am still new to databricks so any help would be nice.

Andy, it totally depends on your needs and expectations, but delta tables help for many data-engineering challenges.

Delta tables behave like a transactional log and can be very helpful for many scenarios like Time traveling . This gives the ability to rollback , reproduce some experiments (reading an older version of the data), allows to analyze difference between data versions (changes).

Also when dealing with parquet, we don't have to rewrite the full dataset, we only write the updated data .

If you don't need any of this, then maybe you can forget about delta tables and focus on pure performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM