简体   繁体   中英

How to get Parquet file size and number of lines using Java?

I have created the parquet file using Spark.

I have need of parquet meta data like file size and number of lines within it. Is there any solution to get this information using Spark library or Java?

You can use Java File API in scala to get the size as

val file = new File("some.parquet")
val fileSize = file.length

This returns the size in bytes you can convert as you want.

If you want the count the records you need to load to spark and get the count. If you want to get the number of lines then

val lineCount = io.Source.fromFile("some.parquet").getLines.size 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM