简体   繁体   中英

Extension of compressed parquet file in Spark

In my Spark job, I write a compressed parquet file like this:

df
  .repartition(numberOutputFiles)
  .write
  .option("compression","gzip")
  .mode(saveMode)
  .parquet(avroPath)

Then, my files has this extension: file_name .gz.parquet

How can I have ".parquet.gz" ?

I don't believe you can. File extension is hardcoded in ParquetWrite.scala as concatenation of codec + ".parquet":

  :
    override def getFileExtension(context: TaskAttemptContext): String = {
      CodecConfig.from(context).getCodec.getExtension + ".parquet"
    }
  :

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM