简体   繁体   中英

Reading large number of bytes from GZIPInputStream

I am reading a gzipped file through GZIPInputStream. I want to read a large amount of data at once, but no matter how many bytes I ask the GZIPInputStream to read, it always reads far less number of bytes. For example,

val bArray = new Array[Byte](81920)
val fis = new FileInputStream(new File(inputFileName))
val gis = new GZIPInputStream(fis)
val bytesRead =  gis.read(bArray)

The bytes read are always somewhere around 1800 bytes, while it should be nearly equal to the size of bArray, which is 81920 in this case. Why is it like this? Is there a way to solve this problem, and really have more number of bytes read?

I would try using akka-streams in case you have large amount of data.

  implicit val system = ActorSystem()
  implicit val ec = system.dispatcher
  implicit val materializer = ActorMaterializer()

  val fis = new FileInputStream(new File(""))
  val gis = new GZIPInputStream(fis) 
  val bfs: BufferedSource = Source.fromInputStream(gis)

bfs exposes the Flow api for stream processing.

You can also get a stream from that:

val ss: Stream[String] = bfs.bufferedReader().lines()

The read might always return fewer bytes than you ask for, so in general you always have to loop, reading as many as you want.

In other words, giving GZIPInputStream a big buffer doesn't mean it will be filled on a given request.

import java.util.zip.GZIPInputStream
import java.io.FileInputStream
import java.io.File
import java.io.InputStream
import java.io.FilterInputStream

object Unzipped extends App {
  val inputFileName = "/tmp/sss.gz"
  val bArray = new Array[Byte](80 * 1024)
  val fis = new FileInputStream(new File(inputFileName))
  val stingy = new StingyInputStream(fis)
  val gis = new GZIPInputStream(stingy, 80 * 1024)
  val bytesRead = gis.read(bArray, 0, bArray.length)
  println(bytesRead)
}

class StingyInputStream(is: InputStream) extends FilterInputStream(is) {
  override def read(b: Array[Byte], off: Int, len: Int) = {
    val n = len.min(1024)
    super.read(b, off, n)
  }
}

So instead, loop to drain instead of issuing one read:

  import reflect.io.Streamable.Bytes
  val sb = new Bytes {
    override val length = 80 * 1024L
    override val inputStream = gis
  }
  val res = sb.toByteArray()
  println(res.length)  // your explicit length

I'm not saying that's the API to use, it's just to demo. I'm too lazy to write a loop.

OK, I found the solution. There is a version of constructor for GZIPInputStream that also takes the size of the buffer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM