简体繁体中英

Extension of compressed parquet file in Spark

原文 2022-12-26 16:17:56 3 1 scala/ apache-spark/ parquet/ spark2.4.4

In my Spark job, I write a compressed parquet file like this:

df
  .repartition(numberOutputFiles)
  .write
  .option("compression","gzip")
  .mode(saveMode)
  .parquet(avroPath)

Then, my files has this extension: file_name .gz.parquet

How can I have ".parquet.gz" ?

1 answers

I don't believe you can. File extension is hardcoded in ParquetWrite.scala as concatenation of codec + ".parquet":

  :
    override def getFileExtension(context: TaskAttemptContext): String = {
      CodecConfig.from(context).getCodec.getExtension + ".parquet"
    }
  :

spark parquet - How to load LZO compressed parquet file

Read a compressed file *with custom extension* with spark

reading compressed file in spark with scala

spark: read parquet file and process it

Overwriting the parquet file throws exception in spark

Create Parquet file in Scala without Spark

How to set Parquet file encoding in Spark

How to save a partitioned parquet file in Spark 2.1?

Spark Dataset on Hive vs Parquet file

spark parquet conversion issue with malformed lines in file

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question spark parquet - How to load LZO compressed parquet file Read a compressed file *with custom extension* with spark reading compressed file in spark with scala spark: read parquet file and process it Overwriting the parquet file throws exception in spark Create Parquet file in Scala without Spark How to set Parquet file encoding in Spark How to save a partitioned parquet file in Spark 2.1? Spark Dataset on Hive vs Parquet file spark parquet conversion issue with malformed lines in file

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM