In my Spark job, I write a compressed parquet file like this:
df
.repartition(numberOutputFiles)
.write
.option("compression","gzip")
.mode(saveMode)
.parquet(avroPath)
Then, my files has this extension: file_name .gz.parquet
How can I have ".parquet.gz" ?
I don't believe you can. File extension is hardcoded in ParquetWrite.scala
as concatenation of codec + ".parquet":
:
override def getFileExtension(context: TaskAttemptContext): String = {
CodecConfig.from(context).getCodec.getExtension + ".parquet"
}
:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.