简体   繁体   English

在Scala中,如何从二进制文件中读取由字符分隔的字节?

[英]In Scala, how to read bytes from binary file delimited by characters?

In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items. 在Scala中,给定一个二进制文件,我对检索Array [Byte]项列表感兴趣。

For example, the binary file has items delimited by the characters/bytes 'my-delimiter'. 例如,二进制文件包含由字符/字节“ my-delimiter”定界的项目。

How can I get a list of Array[Byte] for each item? 如何获取每个项目的Array [Byte]列表?

Functional solution, with help of java.nio : 功能解决方案,借助java.nio

import java.nio.file.{Files, Paths}

object Main {

  private val delimiter = '\n'.toByte

  def main(args: Array[String]): Unit = {
    val byteArray = Files.readAllBytes(Paths.get(args(0)))

    case class Accumulator(result: List[List[Byte]], current: List[Byte])

    val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil)) {
      case (Accumulator(result, current), nextByte) =>
        if (nextByte == delimiter)
          Accumulator(current :: result, Nil)
        else
          Accumulator(result, nextByte :: current)
    } match {
      case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)
    }
    items.foreach(item => println(new String(item)))
  }

}

This solution is expected to have poor performance though. 但是,预计该解决方案的性能较差。 How important is that for you ? 这对您有多重要? How many files, of what size and how often will you read? 您将读取多少个文件,大小和读取频率? If performance is important, than you should rather use input streams and mutable collections: 如果性能很重要,则应该使用输入流和可变集合:

import java.io.{BufferedInputStream, FileInputStream}

import scala.collection.mutable.ArrayBuffer

object Main {

  private val delimiter = '\n'.toByte

  def main(args: Array[String]): Unit = {
    val items = ArrayBuffer.empty[Array[Byte]]
    val item = ArrayBuffer.empty[Byte]
    val bis = new BufferedInputStream(new FileInputStream(args(0)))
    var nextByte: Int = -1
    while ( { nextByte = bis.read(); nextByte } != -1) {
      if (nextByte == delimiter) {
        items.append(item.toArray)
        item.clear()
      } else {
        item.append(nextByte.toByte)
      }
    }
    items.append(item.toArray)
    items.foreach(item => println(new String(item)))
    bis.close()
  }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM