简体繁体中英

Spark writing performance csv vs snappy-orc

原文 2022-01-03 20:16:20 5 1 apache-spark/ orc

If I need to write dataframe on disk which format will perform better csv or 'orc with snappy'?

One hand csv format will avoid compression task overhead but on another hand snappy will reduce total byte size writing task. Please correct me in assumptions here as well?

Note that my question is about writing performance not the storage point of view.

1 answers

Compression is about saving space, not performance, so the fact you're using Snappy is not really a relevant detail as you could use LZ4 or ZSTD instead, for example.

ORC is a column oriented data format that performs better for analytics than CSV, and under certain conditions, will outperform Spark's default format of Parquet.

Writing Spark dataframe in ORC format with Snappy compression

Converting CSV to ORC with Spark

Poor spark performance writing to csv

Performance improvements saving Spark ORC

Writing Spark Dataframe to ORC gives the wrong timezone

Spark writing to hive as ORC error: Path is not a file

Error writing ORC file from spark to hadoop

Flattening a nested ORC file with Spark - Performance issue

Spark com.databricks.spark.csv is not able to load a snappy compressed file using node-snappy

How to merge small files in spark while writing into hive orc table

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Writing Spark dataframe in ORC format with Snappy compression Converting CSV to ORC with Spark Poor spark performance writing to csv Performance improvements saving Spark ORC Writing Spark Dataframe to ORC gives the wrong timezone Spark writing to hive as ORC error: Path is not a file Error writing ORC file from spark to hadoop Flattening a nested ORC file with Spark - Performance issue Spark com.databricks.spark.csv is not able to load a snappy compressed file using node-snappy How to merge small files in spark while writing into hive orc table

Related Tags

Spark writing performance csv vs snappy-orc

Question

1 answers

solution1 0 2022-01-03 23:01:15

solution1
0 2022-01-03 23:01:15