简体繁体中英

Are flatfiles(orc,csv) more efficient than delta table in spark

原文 2019-12-17 16:34:03 2 1 apache-spark/ pyspark/ apache-spark-sql/ databricks

I am working with around 16 delta tables with around 1 to 3 million rows in databricks.
So when I am trying to perform an operation like join and then delete or insert in these delta tables it is taking a long time.
I have to do mostly insert and delete operations. So should I use flatfiles instead of delta tables. Or should i try merge with delta tables insted.

Hence I had doubts about what are the advantages of delta and why not use flat files?

Its a basic question but I am still new to databricks so any help would be nice.

1 answers

Andy, it totally depends on your needs and expectations, but delta tables help for many data-engineering challenges.

Delta tables behave like a transactional log and can be very helpful for many scenarios like Time traveling . This gives the ability to rollback , reproduce some experiments (reading an older version of the data), allows to analyze difference between data versions (changes).

Also when dealing with parquet, we don't have to rewrite the full dataset, we only write the updated data .

If you don't need any of this, then maybe you can forget about delta tables and focus on pure performance.

Converting CSV to ORC with Spark

Is MySQL more efficient in query optimization and general efficiency than Apache spark

more efficient method in Spark than filter.count?

Spark DataFrame ORC Hive table reading issue

Spark: load or select Hive table of ORC format

how to read orc transaction hive table in spark?

Spark task fails to write rows into ORC table

Spark Delta table restore to version

Spark sql Optimization Techniques loading csv to orc format of hive

Spark writing performance csv vs snappy-orc

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Converting CSV to ORC with Spark Is MySQL more efficient in query optimization and general efficiency than Apache spark more efficient method in Spark than filter.count? Spark DataFrame ORC Hive table reading issue Spark: load or select Hive table of ORC format how to read orc transaction hive table in spark? Spark task fails to write rows into ORC table Spark Delta table restore to version Spark sql Optimization Techniques loading csv to orc format of hive Spark writing performance csv vs snappy-orc

Related Tags

Are flatfiles(orc,csv) more efficient than delta table in spark

Question

1 answers

solution1 0 ACCPTED 2019-12-17 22:26:55

solution1
0 ACCPTED 2019-12-17 22:26:55